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PROTEOME-WTOE MAPPING OF POST-TRAIVSLATIONAL 

MODIFICATIONS USING END0NUCL3:ASES 

CROSS-REFERENCES TO RELATED APPLICATIONS 
[0001] The present ^plication claims priority to U*S. Provisional Patent Application No. 
60/405,589, filed August 14> 2002, the disclosure of \^ch is incorp orated herein in its 
entirety for all purposes. 

STATEMENT AS TO RIGHTS TO DsTVENTIONS MADE UlSIDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 
[0002] The present invention Wcis supported by a grant firom the >Tational Institutes of 
Health (CA 7003 1 ). The Government may liave rights in this invention. 

BACKGROUND OF THE INVENTION 
[0003] Protein post-translational modification is one of the domin^t mechanisms of 
information transfer in cells. A major goal of current proteomic efforts is to generate a 
system level map describing all the sites of protein post-translational modification. Recent 
effort toward this goal has focused on developing new technologies for enriching and 
quantitating phosphopeptides. By contrast, identification of tiie sites of phosphorylation 
typically reties exclusively on the use of tandem mass spectrometry to sequence individual 
peptides. 

[0004] Much of the complexity of higher organisms is believed to reside in the specific 
post-translational modification of proteins (Venter et al. Science, 2O01, 291(5507): 1304- 
5 1 .)• Protein phosphorylation is the most ubiquitous such modificatiion; almost 2% of the 
human genome encodes protein kinases and an estimated one-third of all proteins contain a 
covalently bound phosphate group (Maiming et a/.. Science^ 2002, 298(5600): 1912-34). 
Due to the importance of protein phosphorylation in regulating celluilar signaling events, 
there is intense interest in developing technologies for mapping pho sphorylation events on a 
proteome-wide scale. 

[0005] Existing approaches for phosphorylation site mapping iel>r ahnost exclusively on 
the use of tandenx mass spectrometry (MS/\4S) to sequence individual peptides in order to 
localize sites of pliosphorylation. Despite tibe power of this approac^h, MS/MS of 
phosphopeptides remains challenging due to (i) the signal suppression of phosphate 
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containing molecules in the commonly used positive detection mode, (ii) ttte diflSculty in 
achieving full sequence coverage, especially for long peptides, peptides present in low 
abundance, and peptides phosphorylated at sub-stoichiometric levels - all of which are 
common for phosphopeptides, (iii) the difficulty in localizing the phosphoamino acid within 
axx MS/MS spectrum due to the inherent lability of the phosphate group, and (iv) the inability 
to distinguish between distinct phosphoisofonns of a single polypq)tide that may coexist in a 
biological sample (McLachlin et al, Curr Opin Chem Biol, 2001, 5(5): 59L -602; Mann et aL, 
Trends Biotechnol, 2002, 20(6): 261-8; Zhou et c^L, Nat Biotechnoh 2001 ' 19(4): 375-8; Oda 
et al, Nat Biotechnoh 2O01, 19(4): 379-82; Steera et al, J Am Soc Mass Sp ectrom, 2002, 
13(8): p. 996-1003). Th.e challenge of mapping pho^horylation sites is highlighted by recent 
efforts to CTrich phosphopeptides from complex mixtures. While these stra-tegies have 
provided powerful tools for piuifymg phosphopeptides, the next step - ideoLtifying the precise 
site of phosphorylation — often fails for many of the peptides fliat are recovered. 

[O006] Currently, the first step in mapping the phosphorylation sites of a t^^otein is to digest 
th.e phosphoprotein with a protease (e.g., trypsin) that generates smaller pep'tide fragments for 
sequencing. We reasoned that this process would be more informative if a protease that 
specifically cleaved its sxibstrates at the site of phosphorylation were used. Such a digestion 
would selectively hydrolyze the amide bond adjacent to each phosphorylate<l residue, 
facilitating identificatiorx of the phosphorylation site directly from the cleavage pattern (e.g., 
from an MS 'fingerprint* specifying the exact masses of the cleavage produots). 
Phosphospecific cleavage woxild also facilitate the interpretation of MS/MS spectra, since the 
C-terminal residue would, always be the formerly phosphorylated residue, resulting in a 
umque yi ion. In this regard, it is often possible to obtain tandem mass spec^tra of a 
pbosphopeptide, but still fail to localize the phosplioamino acid within that sequence. 
Presently, no protease is known that selectively recognizes a phosphorylated amino acid, or 
aixy other post-translatiozial modification. 

[OO07] A method to address this problem utilizixig a strategy for specific i>TOteolysis at sites 
orpost-translational modification, such as phosplxorylation, would represent a significant 
advance in the art. The present invention satisfies this and other needs. 
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BRIEF SUMMARY OF THE INVENTION 
[0008] The present invention provides novel endonucleases for use in moping post* 
transladonal modification sites in a genome, such, as tiae human genome. The preset 
invention provides endonucleases that, surprisingly, site-specifically cleave a post- 
traxislationally modified polypeptide at a site of post-translational modifioation. 

[0O091 ^ a fi^st aspect, the invention provides a method of mapping the sites of 
polypeptide post-translational modifications. The method includes site-specifically cleaving 
a peptide bond of the post-translationally modified polypeptide with an exidopeptidase at a 
site of post-translational modification to produce a degraded post-translatnonally modified 
polypeptide. After cleavage at the site of post-translational modification^ the site of post- 
translational modificatiorL is deteimined. 

[OOlO] In another aspect, ttie present invention provides an endopeptidsise that site- 
specifically cleaves a peptide bond of a post-translationally modified polypeptide at a site of 
post-translational modifilcation, wherein the endopeptidase comprises an active site that binds 
to said post-translational modification. 

[OOll] In another aspect, the endopeptidases o£ the present invention axe produced by a 
method that includes introducing one or more point mutations into a model endopeptidase at 
one or more candidate amino acid positions in an. active site of the model endopeptidase to 
produce a plurality of candidate endopeptidases. At least one of the plurality of the candidate 
endopeptidases is an endopeptidases of the present invention that site-specifically cleaves a 
peptide bond of a post-translationally modified polypeptide at a site of post-translational 
modification. The endopeptidase that site-specifically cleaves at said site of post- 
translational modification is identified by contacting each of the plurality of candidate 
endopeptidases with the post-translationally modified polypeptide to determine whether or 
not each candidate endopeptidase site-specificall^r cleaves the peptide bond of the 
polypeptide at the site of a post-translational moc3ification. 

[O012] M another aspect, the present invention provides an isolated nu^cleic acid encoding a 
endopeptidase which site-specifically cleaves a peptide bond of a post-trsmslationally 
modified polypeptide at a site of post-translational modification and whi<;h comprises one or 
more point mutations at one or more amino acid positions within the endopeptidase active 
site. The isolated nucleic acid contains a subseq-uence having at least 70 % nucleic acid 
sequence identity to a nucleic acid sequence of Figure 2. 
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[0013] In aootfaer aspect, the present inveirtion provides an isolated nucleic acid encoding a 
endopeptidase whioh site-specifically cleaves a peptide bond of aix>st-translationally 
modified polypeptide at a site of post-transla.^onal modification aad. which comprises one or 
more point mutatloDS at one or more amino acid positions within th& endop^lidase active 
site. The isolated nucleic acid hybridizes mider highly stringent hybridization conditions to a 
nucleic acid sequence of Figure 2, wherein ttie hybridization reactioai is incubated at 42^C in 
a solution comprising 50% fonnanude^ 5% SSC and 1% SDS, and \sr ashed at es^'C in a 
solution comprising 0.2k SSC and 0.1% SDS. 

BRIEF DESCRIPTION OF THE DRAWINCrS 
[0014] Figure 1 is an amino acid sequence of a subtilisin model endop^tidase. 

[0015] Figure 2 is a nucleic acid sequence that encodes a subtilisi3i model endopeptidase. 

[0016] Figure 3 illustrates a comparison of a compute generated -three-dimensional 
structure of the model subtiUsin and a phospliotyrosine polypeptide - 

[00171 Figure 4 illustrates the phosphotyrosine site-specificity of candidate subtilisin 
endopeptidases and the model subtilisin endopeptidase against either an unmodified tsnrosine 
or phenylalanine. 

[00181 Figure 5 shows kinetic data for the site-specific cleavage at a phosphotyrosine by a 
subtilisin endopeptidase containing the substitution point mutations P129G and E156R. 

[0019] Figure 6 shows Idnetic data for the site-specific cleavage &t a phosphotyrosine by a 
subtilisin endq[>eptidase containing the substitution point mutations^ G127S and E156R. 

[0020] Figure 7 is an amino acid sequence of a subtilisin model exidopeptidase containing a 
signal sequence (in bold) and a pro-domain ^underlined). 

[0021] Figure 8 is a nucleic acid sequence that encodes a subtilisin model endopeptidase 
containing a signal sequence (in bold) and a. pro-domain (underlin&<l). 

DETAILBD DESCRIPTION OF THE INVENTION 
[0022] In contrast to presently utilized methods of developing a system level map 
describing all the sites of i>ost*translational peptide modification, e.g., polypeptide 
phosphorylation, the present invention provides an ^proach for post-translational 
modification mapping that makes it possible to enzymatically interrogate a protein sequence 
directly to identify sites of post-tcanslational modification. 
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Definitions 



[0023] The tenn "point mutation" refers to a deletion, addition, or substitution at a designed 
amino acid position in an amino acid or nucleotide sequence. Preferably, the term refers to 
an amino acid substitution. 

[0024] "Candidate amino acid position** ref^ to an amino acici position in the active site of 
a model endopeptidase that is selected for deletion or substitution of the amino acid at the 
position or for addition of an amino acid at the position. The selection of the candidate axuino 
acid position may be at random or rationally based. Preferably, tihe selection is ratiocaUsi^ 
based on a comparison between tfaree-dim^sional structures of t3ie model endopeptidase 
active site and. the post-transitionally modified polypeptide. 

[0025] ^'Nuoleic acid" refers to deoxyribonucleotides or ribonixcleotides and polymers 
thereof in single- or double-stranded form, or complements thereof. The term encompasses 
nucleic acids containing known nucleotide analogs or modified backbone residues or 
linkages, whioh are synthetic, naturally occurring, and non-nahnrally occurring, which have 
similar bindiixg properties as the refererxce nucleic acid, and which are metabolized in a 
maimer similar to the reference nucleotides. Examples of such analogs include, without 
limitatioii, phosphorofliioates, phosphoramidates, methyl phosptionates, chiral-methyl 
phosphonateSs, 2-O-methyl ribonucleotides, pqptide-nucleic acids ^NAs). Nucleic acids also 
include complementary nucleic acids. 

[DQ26] Unless otherwise indicated, a particular nucleic acid sequence also implicitiy 
encompasses conservatively modified variants thereof (e.g., dej^enerate codon substitutLons) 
and conq)lem.entary sequences, as well as the sequence explicitL:y indicated. Specificall^r, 
degenerate codon substitutions may be achieved by generating sequences in which the tliird 
position of orie or more selected (or alT) codons is substituted with mixed-base and/or 
deoxyinosine residues (Batzer et al. Nucleic Acid Res. 19:5081 (1991); Ohtsuka et aL, 
Biol Chem. 260:2605-2608 (1985); Rossolmi et al., Mol Cell JProhes 8:91-98 (1994))^ The 
tearm nucleic ;acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, and 
polynucleoticJe. 

[0027] A particular nucleic acid sequence also implicitly encompasses "splice variants." 
Similarly, a particular protein encoded by a nucleic acid implicitly encompasses any protein 
encoded by a. splice variant of that nucleic acid. "Splice variants," as the name suggests, are 
products of alternative splicing of a gene. After transcription, atn initial nucleic acid trajoscript 
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may be spliced such ibat different (alternate) nucleic acid splice products encode difEerent 
polypeptides. Mechanisms for the pn>duction of splice vajdants vary, but include alternate 
splicing of exons. Alternate polypeptides derived firom the same nucleic acid by read-through 
transcription are also ^compassed by this definition. Any products of a splicing re&ction, 
includins recombinant forms of the splice products, are included in this definition. 

[0028] "Conservatively modified variants" applies to both amino acid and nucleic acid 
sequences. With respect to particular nucleic acid sequences, conservatively modified 
variants refers to those nucleic acids which encode identical or essentially identical amino 
acid sequences, or where the nucleic acid does not encode an amino acid sequence, t:o 
essentially identical sequences. Because of the degeneracy of the genetic code, a large 
number of functionally identical nucleic acids encode any given protein. For instance, the 
codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at ev&xy 
position where an alanine is specified by a codon, the codon can be altered to any o£ the 
corresponding codons described without altering the encoded polypeptide. Such na<5leic acid 
variations are "silent variations," which are one species o£* conservatively modified "variations. 
Every nvcleic acid sequence herein which encodes a polypeptide also describes evexy 
possible silent variation of the mxcleic acid. One of skill will recognize that each co don in a 
nucleic acid (except AUG, whicli is ordinarily the only codon for methionine, and ITGG, 
wtiich is ordinarily the only codon for tryptophan) can be modified to yield a functionally 
idendoal molecule. Accordingly, each silent variation of a nucleic acid which encodes a 
polypeptide is implicit in each described sequence with respect to the expression product, but 
not witb respect to actual probe sequences. 

[0029] As to amino acid sequences, one of skill will recognize that individual sul>stitutions, 
deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence wDhich 
alters, aidds or deletes a single aixiino acid or a small percentage of amino acids in ttae encoded 
sequence is a "conservatively modified variant" where the alteration results in the soibstitution 
of an amino acid with a chemically similar aroino acid Conservative substitution tables 
providing functionally similar amino acids are well knowTi in the art. Such conserv^atively 
modified variants are in addition to and do not exclude polymorphic variants, interspecies 
homologs, and alleles of the invention. 

[0030] The following eig^ groups each contain amino acids that are conservative 
substitutions for one another: 
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1) 


Alamoe (A), Glycine (G); 


2) 


AspaiticacidPX Glutamic acid 


3) 


Aq>aragine (^Q, G-lutamine (Q); 


4) 


Arginine (R), Lysine QS); 


5) 


Isoleucine (I), Leucine (L), Methionine (M), Valine (V); 


6) 


Phenylalanine (F), Tyrosine (Y), Tcyptophatx. (W); 


7) 


Serine (S), Threonine (T); and 


8) 


Cysteine (C), Methionine (M) 


(see. 


e.g., Crd^ton, Proteins (1984)). 



[0O3 11 Macromolecular structures such as polypeptide structures can be des<:ribed in terms 
of various levels of organization. For a general discussion of liiis organization, see, e.g,, 
Alberts et al. Molecular Biology of the Cell {i^^ ed., 1994) and Cantor and Scliimmel, 
Biophysical Chemistry Part J: The Conformation of biological MacromolecuT^ (1980). 
'^Primary structure" refers to the amino acid sequence of a particular peptide. "Secondary 
struoture" refers to locally ordered, three dimensional structures within a polypeptide- These 
structures are commonly known as domains. Domains are portions of a polypeptide that 
form a compact unit of the polypeptide and are typics^y about 18 to 350 amino acids long, 
e.g.» the transmembrane regions, pore loop domain, asad the C-terminal tail domain. Typical 
domains are made up of sections of lesser organization such as stretches of jS- sheet and a- 
helices. "Tertiary structure" refers to the complete three dimensional stmctur^ of a 
polypeptide monomer. "Quaternary structure" refers to the three dimensionaL structure 
formed by the noncovalent association of indq>end€nt tertiary units. Anisotropic terms are 
also known as energy tenns. 

[0032] The term "recombinant" when used with reference, e,g., to a cell, or nucleic acid, 
protein, or vector, indicates that the cell, nucleic acid, protein or vector, has l>een modified by 
the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic 
acid or protein, or that the cell is derived from a cell so modified. Thus, for example, 
recombinant cells express genes that are not foxmd within the native (non-recombirumt) form 
of the cell or express native genes that are otherwise abnormally expressed, luxder expressed 
or not expressed at all. 

I0O33] An "expression v^ector** is a nucleic acid cooistruct, generated reconiLbinantly or 
synthetically, with a series of specified nucleic add elements that permit traruciiption of a 
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particular nucleic acid in a host cell. The expression vector can be part of a plasmid, virus, or 
nucldc acid fiagm^t. Typically, the expression, vector includes a nucleic acid to be 
transcribed operably lixiked to a promoter. 

[0034] The terms "identical" or percent "identity," in the context of two or more nucleic 
acids or polypeptide sequences, refer to two or naore sequences or subsecjiiences that are the 
same or have a specified percentage of amino acid residues or nucleotides that are the same 
(i.e., 60% identity, prejferably 65%, 70%, 75%, 80%, 85%, 90%, or 95% identity over a 
specified region), when compared and aligned for maximum correspondesnce over a 
comparison window, or designated region as measured using one of the following sequence 
comparison algorithms or by manual alignment and visual inspection, Siach sequences are 
then said to be "suhstantiaUy identical." This definition also refers to the compliment of a test 
sequence. Preferably, the identity exists over a region that is at least abo^ut 25 amino acids or 
nucleotides in length, or more preferably over a region that is 50-100 anxino acids or 
nucleotides in length. 

[0035] For sequmce comparison, typically one sequence acts as a reference sequence, to 
which test sequences are compared. When Jisimg a sequence comparisorx algorithm, test and 
reference sequences axe entered into a computer, subsequence coordinates are designated, if 
necessary, and sequence algorithm program parameters are designated. IDefault program 
parameters can be used, or alternative parametexs can be designated. TbLe sequence 
comparison algorithin then calculates the percexit sequence identities for the test sequences 
relative to the referen.ce sequence, based on the program parameters. For sequaice 
comparison of nucleic acids and proteins, the BLAST and BLAST 2.0 algorithms and flie 
defaidt paramet^ discussed below are used. 

[0036] A "comparison window," as used herein, includes reference to a segment of any one 
of the number of contiguous positions selected firom the group consistiirg of from 20 to 600, 
usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may 
be compared to a reference sequence of the same numbo* of contiguous, positions after the 
two sequ^aces are optimally aligned. Methods of alignment of sequenoes for conq>arison are 
well-known in the art. Optimal alignment of sequences for comparison, can be conducted, 
e.g., by the local homology algorithm of Smitb. & Waterman, Adv. Appl. Math. 2:482 (1981), 
by the homology alignment algorithm of Needleman & Wunsch, /. Mol. Biol 48:443 (1970), 
by the search for sinailarity method of Pearsom & Lipman, Proc Nat*l ^cad. Sci. USA 
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85:2444 (1988), by computerized implenxratations of these algorithms (GAP, BBSTFTT, 
FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer 
Grovf), 575 Science Dr., Madison, WI), or by manual aligranent and visual inspection {see, 
^ S^. Current Protocols in Molecular Biology (Ausubel et aL, eds. 1995 siipplement)). 

[0037] An exemplary algorithm that is suitable for determining percent sequence identity^ 
and sequence similarity are the BLAST ajid BLAST 2.0 algorithms, which are described inu 
Altschul et al., Nuc. Acids Res. 25:3389-3402 (1977) and Altscbxil et aL, J. Mol Biol. 
215:403-410 (1990), respectively. BLAST and BLAST 2.0 are used, with the parameters 
described herein, to determine percent sequence identity for the xmcleic acids and proteins of 
the invention. Software for performing BLAST analyses is publicly available through the 
National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This 
algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying shoirt 
words of lengtli W in the query sequence, which either match or satisfy some positive-valtaed 
threshold score T when aligned with a word of the same length in a database sequence. T is 
referred to as the nei^iborhood word score threshold (Altschul &t ah , supra). These initiaL 
neighborhood word hits act as seeds for initiating searches to find longer HSPs containing 
them. The word hits are extended in botli directions along each sequence fi>r as &r as the 
cumulative alignment score can be increased. Cmnulative scores are calculated using, for 
nucleotide sequences, tiie parameters M (reward score for a pair of matching residues; alw^ays 
> 0) and N (penalty score for mismatching residues; always < 0^. For amino acid sequencos, 
a scoring matrix is used to calculate the cumulative score. Extexision of the word hits in each 
direction are baited when: the cumuladv^e alignment score falls off by the quantity X fiom its 
maximimi achieved value; the cumulative score goes to zero or 1>elow, due to the 
accumulation of one or more negative-scoring residue alignmetxts; or the end of either 
sequence is reached. The BLAST algorithm parametm W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences> 
uses as defaults a wordlengCh (W) of 1 1 ^ an expectation (E) of 1-0, M=5, N=7-4 and a 
comparison of botii strands. For amino acid sequences, the BLASTP program uses as 
defaults a wordlength of 3, and expectation (E) of 10, and the 3LOSUM62 scoring matrbc 
(^eeHenikofr &Henikoff,Proc JVori. Acad. Sci. USA 89:10915 (1989))aUgnments(B) >f 
SO, expectation (E) of 10, M=S, N=-4, and a comparison of botli strands. . 

[0038] The BLAST algorithm also performs a statistical anal^^is of the similarity bdween 
two sequmces (see, e.g.9 Karlin & Altschul, Proc. Nat'L Acad. Sci. USA 90:5873-5787 
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(1993)). One measure of similarity provided by the BLAST algorithm is tbie smallest smn 
probability (P(N)X which provides an indication of the probability by whicli a match between 
two nucleotide or amino acid sequences would oocur by chance. For example, a nucleic acid 
is considered similar to a reference sequence if tfa.e smallest sum probability in a comparison 
of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably 
less than about 0.01, and most preferably less than about 0.001 . 

[00391 An indication that two nucleic acid sequences or polypeptides are substantially 
identical is that the polypeptide encoded by the filxst nucleic acid is immunologically cross 
reactive with the antibodies raised against the polypeptide encoded by the second nucleic 
acid, as described below. Thus, a polypeptide is typically substantially identical to a second 
polypeptide, for example, where the two peptides differ only by conservative substitutions. 
Another indication that two nucleic acid sequences are substantially identical is that the two 
molecules or their complements hybridize to each other under stringent conditions, as 
described below. Yet another indication that two nucleic acid sequences anre substantially 
identical is that the same primes can be used to amplify the sequence. 

[0040] The phrase "selectively (or specifically]) hybridizes to" refers to the binding, 
duplexing, or hybridizing of a molecule only to sl particular nucleotide sequence under 
stringent hybridization conditions when that sequence is present in a complex mixture (e.g., 
total cellular or library UNA or RNA). 

[0041] The phrase "stringent hybridization conditions" refers to conditions under which a 
probe will hybridize to its target subsequence, typically in a coniplex mixtmre of nucleic 
acids, but to no other sequences. Stringent conditions are sequence-depeirdent and will be 
different in different circumstances. Longer seqxiences hybridize specifically at higher 
temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, 
Techniques in Biochemistry and Molecular Biology—Hybridization with hJucleic Probes^ 
"Overview of principles of hybridization and the strategy of nucleic acid assaji^" (1993). 
Generally, stringent conditions are selected to be about 5-10°C lower than, the thermal 
melting point (Tm) for the specific sequence at a. defined ionic strength pH. The Tm is the 
temperature (under defined ionic strength, pH, and nucleic concentration) at which 50% of 
the probes complementary to the target hybridize to the target sequence at: equilibrium (as the 
target sequences ate present in excess, at Tm, SO% of the probes are occiqpied at equilibrium). 
Stringent conditions may also be achieved with the addition of destabilizioig agents such as 
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foxttamide. For selective or specific hybridizatioa, a positive signal is at lea^t two times 
background, preferably 1 0 times background hybridization. Exemplary striagent 
hybridization conditions can be as following: 509^ fonnamide, 5x SSC, and- 1% SDS, 
incubating at 42**Q or, 5x SSC, 1% SDS, incubating at 65^C, with wash in 0.2x SSC, and 
0.1% SDS at 65^*0 

[O042] Nucleic acids that do not hybridize to eaoh other under stringent conditions are still 
svibstantially identical if the polypeptides which tttey encode are substantially identical. This 
occurs, for example, when a copy of a nucleic acid is created using the maximxun codon 
degeneracy permitted by the genetic code. In sucla cases, the nucleic acids t:ypically hybridize 
under moderately stringent hybridization conditioxis. Exemplary "moderately stringent 
hybridization conditions" include a hybridization in a buffer of 40% formannide, 1 MNaCl, 

1 % SDS at 3TCy and a Avash m IX SSC at 45'*C.- A positive hybridization is at least twice 
background. Those of ordinary skill will readily recognize that alternative hybridization and 
wash conditions can be utilized to provide conditions of similar stringency. Additional 
guidelines for detennining hybridization parameters are provided in numerous reference, e.g., 
and Current Protocols in Molecular Biology, ed. A.usubel, et aL 

[0043] For PGR, a temperature of about 36*^C is typical for low stringency amphfication, 
although annealing temperatures may vary between about 32*^0 and AZ^'C depending on 
primer length. For high stringency PGR amplification, a temperature of ab out 62°G is 
typical, although high stringency annealing temperatures can range from about SO^'G to about 
65**C, depending on the primer length and specificity. Typical cycle conditions for both high 
and low stringency amplifications include a denaturation phase of 90*'C - 9^5**C for 30 sec - 2 
xnin., an annealing phase lasting 30 sec. - 2 min., and an extension phase odf about 72**C for 1 - 

2 min. Protocols and guidelines for low and higitx stringency amplification, reactions are 
provided, e.g., in hmis et al (1990) PCR Protocols, A Guide to Methods and Applications^ 
Academic Press, Inc- N.Y.)- 

[00441 The terms "isolated," "purified," or "biologically pxire" refer to naaterial that is 
substantially or essentially free from components that normally accompaa^r it as found in its 
native state. Purity and homogeneity are typically determined using analytical chemistry 
techniques such as polyacrylamide gel electroplxoresis or high performanc^e liquid 
chromatogr^hy. A protein that is the predomirLant species present in a pirqparation is 
substantially purified. 
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[0045] "Polypq)tide'* refers to a polymer in which the monomers are amino acids and are 
joined together fbrough amide bonds, alternatively referred to as a **peptide." The terms 
"peptide" and "polypeptide" encompass proteins. Unnatural amino acids, for example, p- 
alanine, phenylglyciixe and homoarginine are also included under this definition. Amino 
acids that are not geae-encoded may also be us;ed in the present invent! oil Fiuthermore, 
amino acids that have been modified to include reactive groups may also be used in the 
invention. All of ttie amino acids used in the present invention may be either the D - or L - 
isomer. The L -isomers are generally preferred. In addition, other peptidomimetics are also 
usefiil in the present invention. For a general review, see^ Spatola, A. F., in CHEMISTRY AND 
Biochemistry of Ajmino Acids, Peptidbs akd Proteins, B. Weinstein, eds.. Marcel 
Dekker, New York, p. 267 (1983). 

[0046] A "degraded post-translationally mocdified polypeptide" refers to flie polypeptide 
firagments produced fey site-specifically cleaving a post-translationally modified polypeptide 
at a site of post-translational modification using an endonuclease of th.^ present invention. 

[0047] The term "fi-agmentation pattern" refers to the configuration of the polypeptide 
firagments of the degraded post-translationally modified polypeptide as visualized or 
produced by an analytical method. A variety of analytical methods may be used to provide a 
fragmentation pattern. For example, where th.e analytical method is mass spectrometry, the 
firagmentation pattern is referred to as a "mass spectral fi'agmentation pattern." Where the 
analytical method is two-dimensional electrophoresis, the fragmentation pattern is referred to 
as a "two-dimensional electrophoretic fiagm^aitation pattern." 

[0048] The term " amino acid" refers to nattxrally occurring and synthetic amino acids, as 
well as amino acid analogs and amino acid nximetics that function in a manner similar to the 
naturally occurring amino acids. Naturally oocmring amino acids ai^ those encoded by the 
genetic code, as well as those amino acids HbaX are later modified, e.g^ , hydroxyproline, 7- 
carboxyglutamate, and O-phosphoserine. Amino acid analogs refers iio compoxmds that have 
the same basic chemical stmcture as a naturally occurring amino acid., i.e. , an a carbon that is 
bound to a hydrogen, a carboxyl group, an amino groiq>, and an R group, e^., homosecine, 
norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified 
R groups (e.^., norleucine) or modified peptide backbones, but retaiix. the same basic chemical 
structure as a naturally occuxring amino acdd. Amino acid mimetics*" refers to chemical 
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compounds that Ixave a structure that is difC^ent from the general ohemical structure of an 
amino acid, but ttiat functions in a maimer similar to a naturally cocurring amino acid. 

[0049] "Solid snpport," as used herein reJEers to a material that ia substantially insoluble in a 
selected solvent system, or which can be resuiily separated (e,g., precipitation) from a 
selected solvent system in which it is soluble. Solid supports usefiil in practicing the preseot 
invention can include groups that are activated or capable of activation to allow selected 
species to be boimd to the solid support. A- solid si^port can also "be a substrate, for example, 
a chip, wafer or well, onto which an individual, or more than one compound, of the invention 
is bound. 

[0050] By "host cell" is meant a cell that contains an expression, vector and supports the 
replication or expression of the expression vector. Host cells may^ be prokaryotic cells sucln 
as E. colU or eukaryotic cells such as yeast, insect, amphibian, or xnammalian cells such as 
CHO, HeLa and the like, e.g., cultured cells, explants, and cells m vivo. 

Introduction 

[OOSl] One siurprise of the human genome sequence was that thuere are far fewer genes tfaLan 
many had predicted, histead, much of the complexity ofhigh^oxganisms is predicted to 
reside in the specific modification of proteins, and piecing together this extraordinarily 
complex web of post-translational modifications is one of fiie great remaining fix>ntiers in 
biology. For example, phosphorylation is the most ubiquitous an<3 important of tiiese 
modifications (oxie-third of all cellular proteins contain co valentl>^ bound phosphate), and 
understanding titxe molecular logic of protein phosphorylation will be a major step toward 
decoding biological processes. New tools that will aid in the und-erstanding of post- 
translational modifications on a genome wide scale are needed. En view of the importance of 
phosphorylation, the present invention is illustrated by reference to ascertaining the 
phosphorylation pattern of a peptide. The focus on phosphorylation is for clarity of 
illustration and does not limit tixe scope of the invention. 

Mapping the Sites of Polypeptide Post-Trsmslational Modificatioaos 

[0052] In a first aspect, the invention provides a method of mapping a site of polypeptid.e 
post-translational modifications. The metihod includes site-speddScally cleaving a peptide 
bond of the post-translationally modified polypeptide with an endopeptidase at a site of post- 
translafional modification to produce a degraded post-translationally modified polypeptide. 
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After cleavage at tiie site of post-traoslational modification, the site of post-tcanslational 
xnodificadon is deteimined. 

[0053] Site-specific cleavage refers to peptide bond hydrolysis at a preferred site in a 
polypeptide. For example, maoy endopeptidases cleave the amide backbone of polyi^eptides 
site-specifically at a preferred amino acid residue and/or residues. Endopeptidases tbat site- 
specifically cleave polypeptides inclxide, for example, chynaotrypsin, which site-specifically 
cleaves at phenylalanine, tryptophan and tyrosine residues; trypsin, which exhibits 
preferential cleavage at lysine and arginine residues; elastase, which site-specifically cleaves 
at alanine xesidues, and subtilisin, wliich site-specifically cleaves at tyrosine and 
phenylalanine residues. Similarly, endopeptidases of the present invention that clea^ve site- 
specifically at a site of post-translational modification exhitit preferential cleavage at amino 
acid residues that have been post-trajislationally modified. More detailed information 
regarding known protease cleavage sites may be found, for example, in Matayoshi &t aL 
Science247: 954 (1990); Dunned ai. Meth. Enzymol 241: 254 (1994); Seidah etal MetK 
EnzymoL 244: 175 (1994); Thomberry, Afe/A. EnzymoL 24-4: 615 (1994); Weber Meth. 
Enzymol 244: 595 (1994); Smith et al Meth. Enzymol 244: 412 (1994); Bouvier eC al Meth. 
Enzymol 248: 614 (1995), and Hardy et al, in Amyloid Protein Precursorin 
DEvaoPMBNT, Aging, and Alzheimer's Disease, ed. Masters et al pp. 190-198 C1994). 

[00S4] wide variety of methods are usefiil in determnung the specificity of sit&-specific 
cleavage. For example, a test polypeptide containing a flvLorescent donor-fluoresceoxt 
quencher pair can be used to measure the kinetics of cleavage by an endopeptidase* See, for 
example, Meldal et al. Anal Biochem. 195:141-7(1991) aoad Examples section. Ttie 
cleavage kinetics of a test polypeptide containing a particular post-translational modification 
maybe measured and subsequently compared to the cleav^age kinetics of a series o^" control 
polypeptides that do not contain the post-translational mo<dification. Typically, the? test 
polypeptide contains the same amino acid sequence as th& control peptides, wdth Hxe 
exception that the amino acid containing the post-translational modification in the test 
polypeptide is substituted for anotfcier amino acid in the control polypeptide amino acid 
sequences. The amino acid containing the post-translatioxial modification maybe substituted, 
for example, with an immodified natural amino acid, an uannodified non-natural armno acid, 
the sam^e amino acid containing a different post-translational modification, a diffetrent amino 
acid coixtaining the same post-traiLslational modification, and/or a different amino acid 
containing a different post-translational modification. 
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[0055] In an exemplary embodiment, an endopeptidase site-specifically cleaves a 
polypeptide at a site of post-translational modification when the kcat/Km ratio for tihe post- 
translationally modified test polypeptide is higher than, the kcai/Km ratio for a control 
polypqptide or a series of control peptides that do not contain ttie post-translation^ 
modi£cation. In ano&er exexnplaty embodimmt» an exidopeptidase site-specifically cleaves 
at a site of post-translational modification when the k^t/Km ratio is at least about 1*1, 1^, 1.3, 
lA, 1.5, 1.6, 1.7, 1.8, or 1.9 times Mglier for the modified test polypeptide than tbekcat^^ 
ratio for the control polypeptide(s). In another exemplary embodiment, an endopeptidase 
site-specifically cleaves at a site of post-translational modification when the kcai/Km ratio is at 
least about 2, 3, 4, 5, 6, 7, 8, 9, or 10 fold higher for fh.e modified test polypq)tide than the 
kca/Kfl, ratio for the control polypeptide(s). 

(005^ The endopeptidases of the present invention are capable of site-specific ^ly cleaving 
a polypeptide at a site contaixiing any suitable post-traxislational modificatioiL O^er 300 post- 
translational modifications are currently known. See tho world wide web at URL, 
http:y7www.abrf org/index.cjBn/dm,home?AvgMass=aH, Deha Mass, A Database of Protein 
Post-Translational Modifications. Exemplary suitable post-translational modific^ations 
include phosphorylation, sulfonation, glycosylation» ajcetylation, methylations, AJDP- 
ribosylation, methionine oxidation, cysteine oxidatioci, cysteine lipidation, famesylation, and 
geranylation. 

[005 7] In an exemplary embodiment, the post-translational modification is 
phosphorylation. Typically, post-translational phosphorylation occurs at a tyrosine, serine, 
and or threonine. Ih a related exemplary embodiment:, the endopeptidase site-sp^ifically 
cleaves a polypeptide at phosphorylated tyrosine, serijae, or threonine. In another related 
embodiment, the endopq>tid.ase site-specifically cleaves a polypeptide at a phosE>horylated 
tyrosine. 

[0DS8] In another exemplary embodiment, the post-translational modification is 
sulfonation. In a related enxbodiment, the endopeptidase site-specifically cleav&s a 
polypeptide at a sulfonated tyrosine. 

[00591 The present method includes site-specifically cleaving a post-translationally 
mociified polypqptide at a site of post-translational modification with an endopeptidase. 
Typically, an endopeptidase that cleaves at a site of post-translational modification 
hydLrolyzes a peptide bond between two adjacent ami no acid residues, wherein tChe peptide 
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bond is wiftin 10 amino acids in either direction of tiie polypeptide amino aoid containing the 
post-translational modification. For exaniple^ where a tyrosine is phosphorylated, tiie 
endopeptidases of the present invention will site-spccifically cleave the polypeptide at a 
peptide bond wittiin 10 amino acid residues, in eitlaer flie N-tenninal direction or the C- 
tenninal direction, of the phosphorylated tyrosine. Thus, site-specific cleavage at a site of 
post translational modification typically refers to cleavage at a peptide bond between two 
amino acids, wherein the peptide bond is within texi anoino acids in either dixrection of the 
post-transladonally modified anuno acid 

[0060] In an exemplary embodiment, the endopeptidase site-specifically cleaves a post- 
translationally modified peptide at a peptide bond ^thin 1, 2, 3, 4, 5, 6, 7, S, 9, or 10 amino 
acids of the post-translationally modified amino acid. In another exemplary embodiment, the 
endopeptidase site-specifically cleaves a post-fraaslationally modified peptide at a peptide 
bond between tihe post-translationally modified amino acid and the amino aoid immediately 
C-terminal to the post-translationally modified peptide or the amino acid iitomediately N- 
tenninal to the post-translationally modified peptide. Thus, the site of cleavage may be at the 
peptide bond between tibie post-translationally modified amino acid and an amino acid 
adjacent to tiie post-traixslationally modified amiivo acid. 

|0061] The present method also includes determining the site of post-translational 
modification after cleavage at the site of post-translational modification usLng the 
endopeptidases of the present invention. A variety of methods are useM ira determining the 
site of post-translational modification after cleavage. Typically, the metho <is involve 
analyzing tiie degraded post-translationally modi jBed polypeptide produced by cleaving the 
post-translationally modified polypeptide with an. endopeptidase of the present invention. 
Exemplary methods include determining the fragmentation pattern of the polypeptide 
firagments and comparing the pattern to a known or predicted pattern, deteraiining the size of 
the polypeptide fragments, determining the sequence of the polypeptide firagments produced, 
and quantitating the amoimt of polypeptide fragments produced. A variety of analytical tools 
may be employed in conjimction with these metlaods, including, gel electrophoresis (such as 
single and multi-dimensional electrophoresis), mass spectrometry (includi^ig mass 
spectrometry polypeptide sequencing techniques), high performance liqmd chromatography 
(HPLC), nuclear magnetic resonance (NMR), capillary gel electrophoresis, affinity 
chromatography, Edman degradation, high throughput protein chip technology, and the like. 



16 



wo 2004/016752 



PCT/US2003/025456 



[0062] In an exemplary embodimCTt, the site of post-translational modificatioii is 
determined by sequencing the polypeptide firasments produced by cleaving the polypq>tide 
wifli the endopeptidases of tiie present invention. Sequencing can be aocomplished using any 
suitable technique, such as Edman degradatioix or mass spectrometry. 

[0063] In another exemplary embodiment, ttae site of post-translational modification is 
determined fix>m the fragmentation pattern of the degraded post-translationally modified 
polypeptide produced by the endopeptidases of the current invention. The fi:agmentation 
pattern may be compared to predicted firagmeotation patterns of knownL polypeptide 
sequences, thereby identifying the sites of post-translational modifications. Alternatively, the 
firagmentadon pattern may be compared to a plurality of empirically produced firagmentation 
patterns to detenuine the site of post-translational modification. After oleavage, 
fragmentation patterns may be produced by a Arariety of methods, inclixding, for example, 
mass spectrometry and two dimensional gel electrophoresis. These and other methods are 
discussed in more detail in die "Informatics" section below. 

[0064] Post-translationally modified polypeptides of use in the present invention may be of 
any biological or s>mtfaetic origin. For example, the post-translationalLy modified polypeptide 
may be produced using known chemical techxiiques, such as solid phase peptide synthesis on 
a solid support, wherein post-translationally modified ammo acids (eitlier protected or 
unprotected) are incorporated into the polypeptide chain during synthesis ^see Stewart et al^ 
Solid Phase Peptide Synthesis^ Second Editioxi (1984)). Alternatively, an unmodified 
polypeptide chain may be chemically synthesized and subsequently contacted with an 
enzyme in vitro to create a synthetic post-translationally modified polypeptide. In an 
exemplary embodiment, an immodified polypeptide is synthesized usinig solid phase peptide 
synthesis and subsequently phosphorylated with a protein tyrosine Idixase to produce a post- 
translationally modified polypeptide. 

[0065] In another exCTOplary embodiment, the post-translationally modified polypeptide is 
produced in a cell. Using recombinant methods, a secretory signal sequence may be included 
in the polypeptide sequence so that the post-txanslationally modified polypeptide is secreted 
from the cell, thus simplifying purification pTOcedures. Exemplary amino acid signal 
sequences and nucleic acid sequences that encode the signal sequence are described, for 



example, in Wells etal. Nucleic Acids Research, 11:7911-7925 (1983), and in Figures 7 and 
8 (in bold). In another exemplary embodiment, recombinant methods may be used to include 



17 



wo 2004/016752 




PCT/US2003/025456 



an endopeptidase ptodomain, such as ttie subtilisin piodomain ^o'^m in Figure 7 and 8 
(underlined). 

[0066] In another exemplary embodiment, the post-translationally modified polypeptide is. 
produced by a diseased host, wherein at least one post-translational modification is a maikor 
of disease. In et related embodiment, the post-translational modification that is a disease 
marker is sulfonation of a tyrosine. In another exemplary embodiment, the post- 
translationally modified polypeptide is dexived from a non-diseased host. In another 
exemplary emh>odiment, the post-translationally modified polypeptide is targeted for cleavage 
with endopq>tidases of the present invention by at least partially purifying the post- 
translationally modified polypeptide befoxe cleavage with the endLopeptidase. 

[0067] Eadopeptidases 

[0068] In another aspect, the present in"vention provides an endoi>eptidase that site- 
specifically cleaves apeptide bond of a post-translationally modified polypeptide at a site oif 
post-translational modification, wherein the endopeptidase comprises an active site that bin<ls 
to the site of post-translational modification. 

[0069] The active site of an endopeptidase of the present inven.-tion refers to the area of 
endopeptidase tihat binds to the post-translationally modified pol>^eptide and contains the 
amino acids side chains involved in peptide bond hydrolysis. Typically, the active site 
contains amirto acids that bind to the post-translational modification itself in addition to oflier 
areas of the post-translationally modified polypeptide, such as otlier amino acid side chains* or 
polypeptide backfooiiecaifoonyl and/or acxiine groups. The binding and catalytic properties of 
an active site is determined by the three dimensional arrangement of the amino acid side 
chains within, the active site. 

[0070] The active site of the endopeptidase may bind to the post-translational modification 
using any suitable molecular binding interaction. Typically, the binding interaction is a non- 
covalent interaction. Usefid non-covalent binding interactions include, for example, ionic 
interactions, liydrogen bonding. Van der Waals interactions, dipole-dipole interactions, pi-:pi 
stacking interactions, and/or hydrophobic interactions. The active site may also increase 
binding interactions to the post-translational modification by containing a suitable space for 
the post-translational modification to fit within the active site, tbius avoiding steric clashes 
between the endopeptidase active site amino acids and the post-translational modification. 
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[0071] A variety of post-translational modifications are iDOund by the endopeptidases of the 
present ijivention. For example, ^where the post-transladooal modification is phospb^orylation, 
the eadopeptidase active site typically contains one or more positively charged amiimo acid 
side chains that ionically bind to the negatively charged pfaiosphate moiety. In anotfa-er 
example, where the post-translational modification is glycosylation^ the endopeptidase active 
site comprises cyclic amino acid side chain residues that st:ack above or below a su^ar ring of 
the glycos>iation modification. 

[0072] Endopeptidases are proteases that cleave a non-terminal peptide bond of a 
polypeptide substrate. Proteases liave been found to contain common structural features {see 
Stawiski et al, Proc. Natl Acad. Set, 97: 3954-3958 (20O0)). For example, relative to 
proteins of similar size, proteases have smaller than average surface areas, smaller radii of 
gyratioiiL, higher Ca densities, are more tightly packed tham other proteins, and have fewCT 
helices 3nd more loops. Based on these structural similarities, protease fimctionhas been 
predicte<i with over 86% accuracy fi-om the primary amino acid sequence of polypeptides 
(Id), 

[0073] in an exemplary embodiment, the endopeptidase is a serine protease. Serine 
proteases of the present invention differ fix>m previoiisly Icnown serine proteases in nhat they 
are able to site-specifically cleave a post-translationally nxodified polypeptide at a site of 
post-traxislational modification. However, the serine proteases of the present invention 
typicall3/^ retain the features of the enzymes within the sub-subclass EC 3.4.21 « In acldition, 
the serine proteases of the present invention retain the "cataljrtic triad" active site structural 
motif common to all previously known serine proteases^ explained below. 

[0074] Endopeptidases within the serine protease family are stmcturally related tturough a 
commoxi active site structural motif (5ee Stroud, &f. -/4m., 231:74-88(1974)). The active site 
structured motif is commonly referred to as the "catalytic triad," which includes a specific 
three-dimensional arrangement of three amino acids: serine, histidine, and aspartate (see 
Rusell, MoL Biol, 279: 1211-1227 (1998)). The three amino acids act in concert to cleave 
tiie peptide bond of a polypeptide. The catalytic mechanism involves attack of the serine 
hydrox>^l side chain onto tiie carbonyl moiety of the peptide bond to form a tetrahedral 
intermediate, followed by general acid catalysis of the intermediate by the aspartate-polarized 
histidixLC (see Voet et aLy Biochemistry, Second Ed., p. 3^5 (1995)). 
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[00753 The fhree-dimensional structure of the catal>/tic triad is sufficiently similar between 
the members of the serine protease family that tiie S03ne protease catalytic .triad can be 
accurstely detected firom the amino acid sequence aloocie (see Fisch^ et a/., Protein ScL 3: 
769-788 (1994); WaUaceef Protein Sci., 5: 1001-1013 (1996); Wallace e/ a/., Protem 
Sci., 6: 2308-2323 (1997); Rusell,/.Afo/.AW„ 279: 1211-1227(1998)). Methods for 
determining the presence of tlie serine protease catalyse triad typically involve predicting the 
angles and distances between, amino acids in the activ^e site of a protein using cosnputer-based 
algorithms that analyze the primary stracture of ttie pxotein. In some methods, tlie anoino acid 
sequence is additionally considered in determining serine protease identity {see IRusell^ /. 
Mol ^ioL, 279: 121 1-1227 (1998)). Altiiough all serine proteases may not share a high 
degree of an:iino acid sequence identity, one skilled in the art will recognize coinmon serine 
protease structures by analyzing the three dimensional structure of the active site and 
detecting the presence of the serine/histidine/aspartate catal3^c triad. In feet, th-e three 
dimexisional spatial relationsliips of the active site of enzymes are often more injfomiative 
than the one-dimensional primary sequence alone (Rusell, J. MoL Biol,, 279: IZl 1-1227 
(1998)). For example, althoxagh trypsin, chymotrypsin and elastase share similar function, 
three dimensional backbone stmcture, and catalytic txiad structure, only 24 perc^ent of the 
amirto acids are common to all three of these enzymes (see Stroud, Sci. Am., 23 1.: 74-88 
(1974)). 

[007^ In another related embodiment, the endopeptidase is a trypsin serine protease. 
Trypsin serine proteases of the present invention differ firom previously known trypsin serine 
proteases in that tiiey are able to site-specifically cleswe a post-translationally naodified 
polypeptide at a site of post-translational modlScatioiL However, the trypsin serine proteases 
of tfate present invention retain the three dimensional catalytic triad and the non— active site 
elements of secondary and tertiary stracture of previously known trypsin serine proteases. In 
an e:xemplary embodiment, trypsin serine proteases ^e those having the serine protease 
catalytic triad stmcture and the following structural characteristics according to the CATH 
protein structural classification: class 2 (mainly beta.), architecture 2.40 (barrel>, topology 
2.4O.10 (thrombin subimit H), homologous superfamily 2.40.10.10 (trypsin-like serine 
protease), and sequence fanoily 2.40.10.10.2 (trypsica-like serine protease). The trypsin serine 
pioteases of the present invention typically retain tb.e three dimensional catalytic triad and the 
non-active site elements of secondary and tertiary structure of those enzymes ixicluded within 
sub-subclass EC 3.4.21.4. 
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[O077] In another related embodiment, the en<lopeptidase is a subtilisixa. Subtilissns of tiae 
present invention diflfer from previously knowa subtilisins in that they are able to site- 
specifically cleave a post-translationally modified polypeptide at a site o:f post-translational 
modification. However, the subtilisins of the present invention retain the three dimensional 
catalytic triad and flie non-active site elements of secondary and tertiary structure of known 
subtilisin enzymes. In an exanplary embodiment^ the siibtilisin of the pxesent invention is a 
single polypeptide chain that folds into three distinct regions and contaiias eight a-heUces 
(designated A-H, see Wright et al.. Nature, 221 : 235-242 (1969)). hi aaother exemplary 
embodiment, the subtilisin of the present invention retains the three dimensional catalytic 
triad and the non-active site elements of secondary and tertiary structure of those enqones 
included within sulvsubclass EC 3.4.21.62. 

[0078] In anoth^ exCTiplary embodiment, thie endopeptidase is a cysteine protease. 
Common active site structural moti& have been used to successfully identify members of the 
cysteine protease femily {see Rusell, J. Mol Bzol, 279: 121 1-1227 (19^8)). Although 
cysteine proteases lack the serine/histidine/aspartate catalytic triad of the serine protease 
family, similarity in ftie overall tertiary side chain patten and shape of the active site may be 
used to identify members of the cysteine protease family, hi a related eixemplary 
embodiment, the cysteine protease is any enzyme of the sub-subclass EC 3i4.22, which 
consists of proteinases characterized by having a cysteine residue at th& active site and by 
"being irreversibly inhibited by sulfhydryl reagents such as iodoacetate. Mechanistically, in 
catalyzing the cleavage of a peptide amide bond, cysteine proteases form a covalent 
intermediate, called axi acyl enzyme, that involves a cysteine and a bistldine residue in the 
active site (Cys25 and Hisl59 according to papain numbering, for example), 

[0079] In another exemplary embodiment, the endopeptidase of the present invention is 
encoded by a nucleic acid sequence that hybridizes xmder hi^y stringent hybridization 
conditions to a nucleic acid encoding a polypeptide comprising an amiaao acid sequence of 
Figure 1. In a related embodiment, the amino acid sequence additionally contains a 
prodomain sequence as shown in Figure 7 (underlined). In another related embodiment, the 
amino acid sequence additionally contains a signal sequence as shown in Figure 7 (in bold). 
Typically, the hybridization reaction is incubated at 42°C in a solution comprising 50% 
formamide, 5x SSC and 1% SDS, and washed at 65''C in a solution comprising 0.2x SSC and 
0,1% SDS. In a related embodiment, the endopeptidase contains at least one amino acid 
substitution selected from P129G, E156R. SI 91K, G166K, G127S, EL 56K:, P129K, P129R, 
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S159R, and E156G {see Figure 1). In anofkier related embodiinea% the endopeptidase 
contains one of tbe following combinations of substitution point mutations: G127S and 
E156R; P129G and E156R; P129G andBl 56K; E156R and Sl^lK; P129K and E156R; 
P129R and E156K; E156K and G166K; El 56K and S191K; B156K and S191K and S159R^ 
P129R and E156B; and E1S6G and GI66iC Jn another related enabodiment, fbe 
endopeptidase contains a subsequence as d.escTibed above and contains one or two amino acid 
substitutions selected fix>m P129G, E156R, S191K, G166K, and G127S. 

[0080] In anottier exemplary embodiment, the endopeptidase contains a subsequence 
having at least 70% amino acid sequence i<ientity to an aixuno aci<l sequence of Figure 1. En a 
related embodiment, the subsequence has 75%, 76%, 77%, 78%, 79%, 80%, 85%. 86%, 87%, 
88%, 89%, 90%, 91%, 92%, 93%, 94%, 9S%, 96%, 97%, or 98%> amino acid sequence 
idmtity. 

[0081] In anoflier related embodiment, the endopq>tidase contains a subsequence as 
described above and contains at least one amino acid substitution, selected fiom P129G, 
E156R, S191K, G166K, G127S, E156K, P129K, P129R, S159R, and E156G. In another 
related embodiment, the endop^tidase contains a subsequence as described above and 
contains one of the following combinations of substitution point mutations; G127S and 
E156R; P129G and E156R; P129G and E156K; E1S6R and S191K; P129K and E156R; 
P129R and E156K; E156K and G166K; E156K and S191K; E156K and S191K and S159H; 
P129R and E156R; and E156G and G166IC. In another related embodiment, the 
endopeptidase contains a subsequence as described above and contains one or two amino acid 
substitutions selected from P129G, E156R, S191K, G166K, and G127S. In another relate?d 
embodiment, the amino acid sequence additionally contains a prodomain sequence as shc^m 
in Figure 7 (underlined). In another related embodunent, the arciino acid sequence 
additionally corLtains a signal sequence as shown in Figure 7 (in 1>old). 

[0082] In another exemplary embodimsit, the endopeptidase of the present invention is 
encoded by an expression vector. In anotixer exemplary embodisnent, a host cell contains the 
expression vector. A variety of host cells may be used in the methods of the present 
invention (see "Expression in Eukaryotes and Prokaryotea" belo^). In an exemplary 
embodiment, ttxe host cell is B. subtilis. 
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PTodvLctLOn of Bndopeptidases 

[0083] Iix another aspect, the endopeptidases of the preseat invention are produced a 
method that includes introducing one or more point mutatiojas into a model endopeptLdase at 
one or more candidate amino acid positions in an active site of the model endopeptidase to 
produce a plurality of candidate endopeptidases. At least one of the plurality of the candidate 
endopeptidases is an endopeptidase of the present inventioDL that site-specifically clearves a 
peptide bond of a post-translationally modified polypeptide at a site of post-translational 
modification. The endop^tidase that site-specifically cleaves at the site of post-translational 
modification is then identified. Typioally, the endopeptidase identification is accomplished 
by assaying the candidate endopeptid^es. 

[0084] A. variety of model endopeptidases are usefid in ttue current invention. Typically^ 
the model endopq)tidase of the current invention cleaves a peptide bond of apolyp^tide at a 
specific site, such as chymotrypsin, w^hich site-specifically cleaves at phenylalanine, 
tryptophaix and tyrosine residues; trypsin, v^ch exhibits pireferential cleavage at lysine and 
arginine residues; and elastase^ whicli. site-specifically cleax^es at alanine residues. 
Exemplary^ model endopeptidases inolude, for example, serine proteases within the sub- 
subclass EC 3.4.21 and cysteine proteases within the sub-subclass cysteine*3.4.22. ba an 
exemplary^ embodiment, the serine protease is a trypsin endopeptidase within the sul>-subclass 
EC 3.4.21 .4 or a subtilisin endopeptidase within the sub-suTbclass EC 3.4.21.62. 

[0085] tn an exemplary embodiment, the model endopeptidase is encoded by a nucleic acid 
sequmce titiat hybridizes under highty stringent hybridization conditions to a nucleio acid 
encoding sl polypeptide comprising an amino acid sequence of Figure 1, wherein th& 
hybridization reaction is iocubated at 42^C in a solution comprising 50% formamid^, Sx SSC 
and 1% S£)S, and washed at eS^'C in a solution comprising 0.2x SSC and 0.1% SDS ^ In a 
related ecabodiment, fiie amino acid sequence additiozudly contains a prodomain 5eq[iLence as 
shown in [Figure 7 (underlined). Ih another related embodiment, the amiao acid sequCTice 
additionally contaios a signal sequence as shown in Figure 7 (in bold). 

[0086] In another related embodiment, model endopeptidase contains a subseqaence having 
at least 70% amino acid sequence identity to an amino acid sequence of Figure I. hex another 
related embodiment, the subsequenoe has at least 75%, 76*%, 77%, 78%, 79%, 80%, 85%. 
86%, 87^, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 995, or 100% 
sequence identity to an amino acid stequence of Figure 1 . Hn a related embodiment, -the amino 
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acid, sequence additionally contains a prodomain secjiience as shown in Figure H (underlined). 
In aaiother related embodiment, the amino acid sequence additionally contains a signal 
sequence as shown in Figure 7 (in bold). 

I0O87] Endopeptidases of the present invention are typically produced by a naethod that 
includes introducing one or more point mutations int:o the active site of a model 
endopeptidase« As explained above, a point mutatiojti refers to a deletion, addition, or 
substitution at a designed amino acid position in an amino acid or nucleotide sequence. In an 
exemplary embodiment, the one or more point muta--tions is one or more amino acid 
substitutions. In another exiemplary embodiment, one or two substitution poinit mutations are 
introduced into the active site of a model endopeptidase. 

[0OS8] The point mutatioxis are introduced at candidate amino acid positions within the 
active site of the model endopeptidase preferably aftfcer exanuning the three dimensional 
stnxcture of the model endopeptidase. In an exempl^iry embodiment, before inctroducing one 
or more point mutations to a model endopeptidase ai: one or more candidate anadno acid 
positions, the one or more candidate amino acid positions are identified by a naethod that 
includes generating a three-dimensional stracture q£ the model endopeptidase ^tive site. 

[0O89] In another exemplary embodiment, the one or more candidate amino acid positions 
are identified by a method that includes generating a three-dimensional structuure of the model 
endopeptidase active site and a three-dimensional stxuctm-e of the post-translationailly 
modified polypeptide. The three-dimensional stracture of the model endopeptidase active 
site is compared with the site of the post-translatioiLally modified polypeptide^ thereby 
identifying one or more candidate anodno acid positions. Point mutations are tlien introduced 
into the candidate amino acid positions to generate ^ plurality of candidate endopeptidases. 
Upon introduction of one or more point mutations at one or more of the candidate amino acid 
positions, a plurality of candidate endopeptidases is produced. Typically, at least one of the 
pliorality of candidate endopeptidases is an endopeptidase that site-specificall>/ cleaves a 
peptide bond of a post-translationally modified polypeptide at a site of posf-tr^mslational 
modification. 

[0090] In a related embodiment, amino acid subs-titutions at the candidate axnino acid 
positions is rationally designed by generating a three-dimensional stracture of potential 
candidate endopeptidases before generating the actual candidate endopeptidases using 
recombinant techniques. The three-dimensional structure of potential candidate 
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endop^tidase is compared to the post-traoslationally modified po lypeptide to determine 
whether the point mutation provides one or more binding interactions with the post- 
translatioBally modified polypeptide. 

[0091] For example, where the post-translationally modified polypeptide is a 
phosphotyrosine polypeptide, the three-dimensional structure of tbe phosphotyrosine 
polypeptide is compared to the three-dimensional structure of the model endopeptidase, suich 
as trypsin. A. candidate amino acid position is identified in the trs/psin active site that is 
potentially within ionic bonding distance of the phosphate moiety^ of the phosphotyrosine 
polypeptide. However, the amino acid that occupies the candidate amino acid position (for 
example, an alanine or valine residue) in trypsin will typically no"t be capable of forming 
ionic bond with the negatively charged phosphate moiety. Thererfore, a three dimensional 
structure of a potential candidate endopeptidase is generated that contains, for example, an 
arginine substitution at the candidate amino acid position. The structure of the potential 
candidate mdopeptidase is tiien compared with the phosphotyrosine stmcture to detennirae 
whether or not the arginine forms an ionic bond witii the phosphate moiety. If a bond appears 
to be possible fcom the conq[>arison3 a eandidate endopeptidase is- generated containing ara 
arginine substitution point mutation. The candidate endopeptidase is then assayed to 
determine whether or not it site-specifically cleaves the phoaphot:yrosine at the site of 
phosphoiylatiLon. 

[0092] The amino acid substitution will typically depend on th_e type of interaction desired. 
For example, where the post-translationally modified polypqjtide contains a charged moiety, 
an ionic interaction may be desired Amino acids with side chains containing a charged side 
chain maybe substituted for the amino acid in the model peptide at a candidate amino aoid 
position within ionic bonding distance of the charged moiety. Ajmino acids with side chains 
capable of forming an ionic bond with, a negatively charged moi-ety include lysine (pK 1 0-54), 
argimne (pK 12.48) and histidine (pK 6.04). Amino acids with side chains capable of 
forming an ionic bond with a positively charged moiety include aspartic acid (pK 3.9), 
glutamic acid (pK 4.07), tyrosine (pK 10.46), and cysteine (pBC S37). Amino acids withn side 
chains capable of forming a pi-pi stacldng interaction with a poLypeptide aromatic moie-ty 
include phenylalanine, tryptophan, and tyrosine. Amino acids with side chains enable of 
forming hydxogen bonds with a polypeptide moiety include met±donine, tryptophan, serine, 
threonine, asparagine, glutamine, tyrosine, cysteine, lysine, argLnine, histidine, aspartic ^id 
and glutamio acids. Amino acids witti small side chains cs^able of avoiding steric claslaes 
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include glycine, alaiiine and valine. Anuno acids with side chains enable of participating in 
hydrophobic interactions include alanine, valine, leucine, isoleucine, phenylalanine and 
proline. Xn an exemplary embo(Ument, where the post-translatiouaUy xnodified peptide 
contains a charged moiety, only a single charged amino acid is introduced in Hbe active site of 
the model endopeptidase. In a related embodiment, only a single positively charged amino 
acid is introduced into the active site of a model endopeptidase to bind to a phosphorylated 
polypeptide. 

10093] Typically, a computer program is used to generate the three-dimensional structures 
of the post-translationally modified polypeptide, the model endopeptidase, and/or the 
potential candidate endopeptidases. The three-dimCTisional computer- generated structures 
may be based on X-ray crystallographic data or NMR data. Alternatively, the stmctures may 
be predicted from the primary stmcture of the endopeptidase and/or post-translationally 
modified peptide using a computer algoritbxn. 

[0094] The model endopeptidase stmcture may be compared vdth tine post-translationally 
modified polyp^tide stracture to identify candidate amino acid positions. In addition, the 
potential candidate endopeptidase structures may be compared with tbe post-translationally 
modified polypeptide stmcture to identify amino acid substitutions suitable for of binding the 
post-translationally modified polypeptide. -A variety of methods are usefiil in comparing the 
three-dimensional stractures of the model endopeptidase and/or poteuLtial candidate 
endopeptidases with the post-translationall>r modified polypeptide. Tlie comparison typically 
includes the use of a computer-based algorithm that identifies binding interactions^ potential 
binding interactions, and/or steric clashes b etween the post-translationally modified 
polypeptide and Hie amino acid side chains and peptide backbone of t^lie model endopeptidase 
or potential candidate endopeptidase active site. Amino acid side chains that sterically clash 
with or surround the post translationally modified polypeptide are typically identified as 
candidate amino ^cid positions. 

[0095] A variety of usefiil compute based algorithms are useful in the present invention. 
Useflil programs include, for example, lasightll (Accelrys), 3D-Dock: (Imperial College), 
HEX (Aberdeen University), DOT (UCSD), ICM and input scripts for docking (Scripps), 
GRAMM (SUNTT/MUSC), PPD (Colombia), BIGGER (Universidade Nova Lisboa), 
VAJDA/Camacho refinement (University of Boston), DOCK 4,0 (UCSF), Autodock 
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(Scripps),Flex3C(GMD-SCAI,BioSolvrr GmbH), Darwin (Univearsity of Peimsylvamiia), 
and ZDOCK (University of Boston). 

[0096] A pliixality of candidate endopeptidases are produced by introducing point 
mutations at each of the candidate amino acid positions. The candidate radopeptidases may 
contain one point mutation or a combination of point mutations at the candidate amino acid 
positions. To identify endopeptidases that site-specifically cleave a peptide bond of a post- 
translationally modified polypeptide at a site of post-translational modification, the candidat<e 
endopeptidases are typically tested in a cleavage assay. 

[0097] Avariety of cleavage assays are useful in the current inv-ention. Typically, the 
assay involves contacting a candidate encLopeptidase with a test polypeptide comprising a 
post-translatioxial modification. After contacting the test polypeptide with the candidate 
endopeptidase^ the test polypeptide or test polyp^tide firagments are analyzed to detemiine 
whether or not the candidate endopeptidase site-sfpecifically cleaved the peptide bond of ttie 
test polypq)ticle at the site of post-translational modification. Methods of analyzing the test 
polypeptide or fragments thereof include^ for example, sequencing metifiods (such as Edmaa 
degradation and Mass spectrometry), NMR, gel electrophoresis, capillary gel electrophoresis, 
HPLC, coloriixietric assays, and the like. In an exemplary emboddbnent, the test peptide 
contains a fluorescent donor-fluorescent ciuencher pair, as descrit>ed above. 

[0098] hi an. exemplary embodiment, tfaie methods of producing the endopeptidases of th& 
present invention fiirther includes, after perfoinaing the cleavage ^says, producing one or 
more additiortal candidate endopeptidases. The one or more additional candidate 
endopeptidases are typically produced by introducing a new poin.t mutation or new 
combination of point mutations in the active site of a candidate exidopeptidase to optimize 
cleavage speoificity. The candidate amino acid sites and the identity of the amino acid 
substitution is typically based on the resiolts of the cleavage assay. The one or more 
additional candidate endopeptidases are then tested in a second set of cleavage assay. Thus, 
the methods of the present invention also include an iterative design process, in which the 
steps described herein are repeated to produce an optimized endopeptidase that site- 
specifically cleaves a post-translationall^r modified polypeptide. 

General Recombinant DNA Methods 

[0099] The production of endopeptidases of the current invention relies on routine 
tedmiques in the field of recombinant genetics. Basic texts disclosing the g^eral methods of 
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use in tixis invention include Sambrook et a/.. Molecular Cloning, A Laboratory Mamual (2nd 
ed. 1989); Kriegler» Gene Transfer- and Expression: A Laboratory Manual (1990); aand 
Current Protocols in Molecular Biology (Ausubel et aU eds., 1994)). 

(01001 For nucleic acids, sizes are given in either Wlobases (Kb) or base pairs (bp> . These 
are estimates derived from agarose or acr>4a2nide gel electxophoresis, from sequenced nucleic 
acids, or from published DNA sequences. For proteins, sizes are given in kilodaltouLS (kD) or 
amino a^cid residue numbers. Protein sizes are estimated firom gel electrophoresis, firom 
sequenced proteins, from derived amino acid sequences, ox from published protein sequences. 

[0101] Oligonucleotides diat are not commercially available can be chemically synthesized 
according to the solid phase phosphoramidite triester method first described by Beaucage & 
Camthers, Tetrahedron Letts. 22: 1859-1862 (1981), uang an automated syntiiesizeir, as 
described in Van Devanter et al^ Nucleic Acids Res. 12:6 159-6168 (1984). Purification of 
oligonucleotides is by either native acrylamide gel electrophoresis or by anion-exdhange 
HPLC as described in Pearson & Reamer, J. Chrom. 255: 137-149 (1983). 

[0102] Hie sequence of the cloned genes and synthetic oligonucleotides can be verified 
after cloning using, e.g., the chain termination method for sequencing double-stranded 
teniplates of Wallace ^/fl/.. Gene 16:21-26(1981). 

Expression in prokaryotes and eukaryotes 

[0103] To obtain hi^ level expression of a cloned gene, such as those cDNAs en^coding 
endopeptidases, one typically subclones endopeptidase iirto an expression vector th^t contains 
a strong promoter to direct transcription, a transcription/translation terminator, and if for a 
nucleic acid encoding a protein, a ribosome binding site for translational initiation. Suitable 
bacterial promoters are well known in the art and described, e.g., in Sambrook et aE.^ and 
Ausubel et al, supra. Bacterial expression systems for expressing endopeptidases are 
available in, e.g., E, coli, Bacillus sp., and Salmonella (Palva et al.. Gene 22:229-235 (1983); 
Mosbach et a/.. Nature 302:543-545 (1983). Kits for suoh expression systems are 
comnxercially available. Eukaryotic expression systems dfor mammalian cells, yeast, and 
insect cells are well known in th.e art and are also commercially available. 

[0104] Selection of the promoter used to direct expression of a heterologous nucleic acid 
depertds on the particular application. The promoter is pxeferably positioned about the same 
distance from the heterologous transcription start site as it is fix>m the transciiptionL start site 
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in its xiatural setting. As is known in the art, however^ some variation in this distamce can be 
accoirunodated without loss of promoter function. 

[01 OS] In addition to the promoter, the expression \rector typically contains a transcription 
imit ox expression cassette that contains all the additional elements required for th.e 
expression of the endopq)tidase encoding nucleic acid in host cells. A typical 6X£>ression 
cassette thus contains a promoter operably linked to tlie nucleic acid sequence encoding 
endopeptidase and signals req^uired for efiBcient polya^denylation of the transcript, ribosome 
binding sites, and translation termination. Additional elements of the cassette ma.-y include 
enhancers and, if genomic DWA is used as the structULral gene, introns with fimctional splice 
donor and acceptor sites. 

[01O6] In addition to a promoter sequence, the expxression cassette should also contain a 
transcription termination region downstream of the structural gene to provide for efticient 
termixiation. The termination region may be obtained &om the same gene as the promote 
sequence or may be obtained from different genes. 

[0107] The particular expression vector used to transport the genetic information into the 
cell is not particularly critical. Any of the conventional vectors used for expression in 
eukaryotic or prokaryotic cells may be used. Standarrd bacterial expression vectors include 
plasmids such as pBR322 based plasmids, pSKF, pE'T23D, and ftision expression systems 
such as MBP, GST, and LacZ. Epitope tags can also be added to recombinant proteins to 
provide convenioit methods of isolation, e.g., c-myc. 

[01 OS] Expression vectors containing regulatory eLements fiom eukaryotic viiases are 
typioally used in eukaryotic expression vectors, e.g., S V40 vectors, papilloma vixrus vectors, 
and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectoirs include 
pMSG, pAVOCW/A"*^, pMTOlO/A^, pMAMneo-5, baculovirus pDSVE^ and any other vector 
allowing expression of proteins und^ the direction of the CMV promoter, SV40 early 
promoter, S V40 later promoter, metallothionein promoter, murine mammazy tumor virus 
promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoteirs shown 
effective for expression in eukaryotic cells. 

[01 09] Expression of proteins from eukaryotic veotors can be also be regulated using 
indixcible promoters. With inducible promoters, expression levels are tied to th& 
concentration of inducing agents, such as tetracyclicie or ecdysone, by tiie inooiporation of 
response eler&ents for these agents into the promotes:. Generally, high level expxression is 
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obtained from inducible promoters only in the presence of the inducing agent; basal 
expression levels are miyiimal . Inducible expression vectors are oft^ chosen if expression of 
tixe protein of interest is detrimental to eukaiyotic cells. 

[Ol 10] Some expression systems have markers that provide gene amplification such as 
thiymidine kinase and diliydrofolate reductase. AJtematively, high yield expression systems 
not involving gene amplification are also suitable, such as using a baculovirus vector in insect 
ceUs, with endopeptidase encoding sequence under the direction of the polyhedrin promoter 
or other strong baculovirus promoters. 

[Olll] The elements that are typically included in expression vectors also include a 
replicon that functions ixi E. colU a gene encoding antibiotic resistance to permit selection of 
bacteria that harbor recombinant plasmids, and i^nique restriction sites in nonessential regions 
of theplasmidto allow insertion of eukaryotic sequences. The particular antibiotic resistance 
gene chosen is not critical, any of the many resistance genes known in th^ art are suitable. 
The prokaryotic sequences are preferably chosen such that they do not interfere wifli the 
replication of the DNA in eukaryotic cells, if necessary. 

[0112] Standard transfection methods are useci to produce bacterial, maoiunalian, yeast or 
insect cell lines that express large quantities of endopeptidase, which are then purified using 
standard techniques (5ee, e.g,, CoYLeyetaL.J. Biol Chem. 264:17619-17^22 (1989); Guide to 
JProtein Purtftcation, in Methods in Enzymology, vol. 182 (Deutscher, ed. , 1990)). 
Transformation of eukaryotic and prokaryotic oells are performed according to standard 
techniques (^ee, e.g., Morrison, /. Boot 132:349-351 (1977); Clark-Curtiss & Curtiss, 
Methods in Enzymology 101:347-362 (Wu et al., eds, 1983). 

[0113] Any of tiie well-known procedures for introducing foreign nucleotide sequences 
into host cells may be used. These include the xise of calcium phosphate transfection, 
polybrene, protoplast fiision, electroporation, biolistics, liposomes, microinjection, plasma 
vectors, viral vectors and any of the other well known methods for introdlucing cloned 
genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, 
e.g,f Sambrook et al, supra). It is only necessary that the particular genetic engineering 
procedure used be capable of successfully introducing at least one gene Lato the host cell 
capable of escpressing the endopeptidase. 
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[0114] After the expression vector is introduced into tiie cells, the transfected cells are 
cultured under coziditions favoring expression of the endopeptidase, which is recov^ed from 
the culture using standard techniques identified below. 

Purification of Endopeptidases 

[0115] Recombinant endopeptidases can be purified &om any suitable expression system 
by standard techniques, including selective precipitation widi such substances as ammonium 
sulfate; column chromatography, inununopudfication methods, axL^ others {see, e.g.. Scopes, 
Protein Ptaification: Principles and Practice (1982); U.S. Patent I^o. 4,673,641; Ausubel et 
al., supra; and Sambrook et aL, supra). 

[01 16] Recombinant proteins are expressed by transformed bacteria in large amounts, 
typically after promoter induction; but expression can be constitutive. Promoter induction 
with IPTG is one example of an inducible promoter system. Bacteria are grown according to 
standard procedures in the art Fresh or frozen bacteria cells are used for isolation of protemn. 

[0117] Proteins expressed in bacteria ma^y form insoluble aggregates ('inclusion bodies'^* 
Several protocols are smtable for purification of the endopeptidase inclusion bodies. For 
example, purification of inclusion bodies typically involves the ex.traction, separation and/or 
purification of inclusion bodies by disruption of bacterial cells, e.g., by incubation in a buffer 
of50niMTRIS/HCLpH7.5,50mMNaCl,5mMMgCl2, ImM DTT, 0.1 mM ATP, and 1 
mM PMSF. The cell suspension can be lysed using 2-3 passages -through a French Press, 
homogenized usixig a Polytron (Brinkman Instruments) or sonicated on ice. Alternate 
methods of lysing bacteria are ^parent to those of skill in the art ^see, e.g., Sambtook et aJ,^ 
suprai Ausubel &t a/., supra), 

[0118] If necessary, the inclusion bodies are solubilized, and the lysed cell suspension is 
typically centrifixged to remove unwanted insoluble matter. Proteins that formed the 
inclusion bodies may be renatured by dilution or dialysis with a compatible buffer. Suital>le 
solvents include, but are not limited to urea (fi-om about 4 M to about 8 M), formamide (at: 
least about 80%, volume/volume basis), and guanidine hydrochloride (firom about 4 M to 
about 8 M). Some solvents which are capable of solubilizing aggregate-forming proteins, for 
example SDS (sodium dodecyl sulfate), 70% formic acid, are inappropriate for use in this 
procedure due to the possibility of irreversible denaturation of the proteins, accompanied 'by a 
lack of immnnogenicity and/or activity. Although guanidine hydorochloride and similar 
ag^ts are denatinrants, this denaturation is not irreversible and recnaturation may occur upon 
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removal (by dialysis, for example) or dilution of fhe denaturant, allowing re-fonnatioai of 
immunologically and/or biologically active protein. Other sTiitable buffers are known to 
those skilled in the art Endopeptidases are separated fiom other bacterial proteins b^^ 
standard separation techniques, e.g., xvith Ni-NTA agarose resin. 

[0119] Alternatively, it is possible to purify the endopeptidases firom the bacteria p eriplasm. 
After lysis of the bacteria, when the endopeptidases are exported into the periplasm of the 
bacteria, thie periplasmic fraction of the bacteria can be isolated by cold osmotic shook in 
addition to other methods known to skill in the art. To isolate recombinant proteins :firom the 
periplasm^ the bacterial cells are centrifiiged to form a pellet. The pellet is resuspendled in a 
buffer containing 20% sucrose. To lyse the cells, the bacteria are centrifiiged and the pellet is 
resuspended in ice-cold 5 mM MgS04 and kept in an ice bath for approximately 10 tminutes. 
The cell suspension is centrifiiged and the supernatant decanted and saved. The recombinant 
proteins present in tiie supernatant can be separated fiom thte host proteins by standa^rd 
separationi techniques well known to those of skill in the art. 

[0120] Often as an initial step, particularly if the protein mixture is complex, an iraitial salt 
fractionation can separate many of the unwanted host cell proteins (or proteins derived from 
the cell ciolture media) from the recombinant protein of interest. The preferred salt Ls 
anunonium sulfate. Ammonium sulfate precipitates proteins by effectively reducing the 
amount of water in the protein mixture. Proteins then precipitate on the basis of their 
solubility. The more hydrophobic a protein is, the more lik:ely it is to precipitate at lower 
ammonium sulfate concentrations- A. typical protocol includes adding satm'ated amcmoniimi 
sulfate to a protein solution so that the residtant ammonium sulfate concentration is between 
20-30%. This concentration will precipitate the most hydrophobic of proteins. The 
precipitate is then discarded (unless the protein of interest is hydrophobic) and anunoniimi 
sulfate is added to the supematant to a concentration knowTi to precipitate the protein of 
interests The precipitate is then solubilized in buffer and the excess salt reinoved iT 
necessary, either through dialysis ox diafiltration. Other methods that rely on solub ility of 
proteins, such as cold ethanol precipitation, are well knovm to those of skill in the art and can 
be used to fractionate complex protein mixtures. 

[0121] The molecular weight of the endopq>tidases can be used to isolate it fion*. proteins 
of greater and lesser size using ultrafiltration through membranes of diffident pore size (for 
example, Amicon or Millipore membranes). As a first st^, the protein mixture is 
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ultrafiltered tbrougti a membrane witii a pore size th&t has a lower molecular weight cut-off 
than the molecular weight of the protein of interest. The retentate of the ultra&ltration is ihen 
ultrafiltered against a membrane with a molecular cut off greats than the molecular weight of 
tile protein of interest The recombinant protein will pass through the membrane into the 
filtrate. The filtrate can then, be chromatographed as described below. 

[0122] The endopeptidases can also be separated firom other proteins on the basis of its 
size, net surface charge, hydrophobicity, and affinity for ligands. In addition, antibodies 
raised against proteins can be conjugated to column matrices and the proteins 
irrununopurified. All of these methods are well kno^svn in the art. It will be apparent to one 
of skill that chromatographic techniques can be performed at any scale and usixig equipment 
firom many different manufacturers (e.g., Pharmacia Biotech). 

Nricleic Acids 

[0123] Jxk another aspect, the present invention provides an isolated nucleic a.cid encoding a 
endopeptidase which site-specifically cleaves apeptdde bond of apost-translationally 
modified polypeptide at a site of post-translational modification and which comprises one or 
more point mutations at one or more amino acid positions within the endopeptidase active 
site. 

[0124] In another exemplary embodiment, the isolated nucleic acid hybridizes under higfhly 
stringent hybridization conditions to a nucleic acid sequence of Figure 2, wfaearein the 
hybridization reaction is inciibated at 42^C in a sola-tion comprising 50% formsunide, Sx SSC 
and 1% SDS, and washed at 65*^0 in a solution comprising 0.2x SSC and 0,1%> SDS. In an 
exemplary embodiment, the nucleic acid also encodes an endopeptidase containing at least 
one amino acid substitution selected firom P129G, E156R, S191K, G166K, GL 27S, E156K, 
P 129K, P129R, S159R, and E156G (see Figure 1). In another related embodiment, the 
nxacleic acid encodes aa endopeptidase containing oxie of the following combinations of 
substitution point mutations: G127S and E156R; Pa29G and E156R; P129G and E156K; 
E156RandS191K;P129K and E156R; P129R and E156K;E156KandG166K; E156Kand 
S 191K; E156K and S191K and S159R; P129R and E156R; and E156G and G 166K. In 
another related embodiment, the nucleic acid encodes an endopeptidase containing one or 
two amino acid substitations selected from P129G, E156R, S191K, G166K, aaid G127S. In 
another related embodiment, the nucleic acid to which the isolated nucleic acid hybridizes 
\mdcr highly stringent hybridization conditions addatiLonally contains a nucleic: acid sequence 
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^coding a signal sequence as shown in Figure 8 (in bold). In another related embodiment, 
the nucleic acid to which the isolated nucleic acid hybridizes under highl^^ stringent 
hybridization conditions additionally contsuns a nucleic acid sequence enc^oding a prodomain 
as shown in Figure 8 C^d^rlined). 

[0125] In an ex^plary embodiment, the isolated nucleic acid contains a subsequence 
having at least 70% nucleic acid sequence identity to a nucleic acid sequence of Figure 2. In 
a related embodiment, the nucleic acid has 75^, 76%, 77%, 78%, 79%, S0%, 85%. 86%, 
87%, 88%, 89%, 90^/8, 91%, 92%, 93%, 94%, 95%, 96%, 97%, or 98% amino acid sequence 
identity. In a related embodiment, the nucleic acid to which the isolated nucleic acid 
hybridizes under highly stringent hybridization conditions additionally contains a nucleic acid 
sequence encoding a signal sequence as shown in Figure 8 (in bold). In another related 
embodiment, the nucleic acid to which the isolated nucleic acid hybridizes under highly 
stringent hybridization conditions additionally contains a nucleic acid sequence encoding a 
prodomain as shown in Figure 8 (underlined). 

I012d] In another related embodiment, the nxicleic acid contains a subs equence as described 
above and encodes an endopeptidase containing a subsequence as descrilDed above and 
contains at least one amino acid substitution selected from P129G, B1S6XI, S191K, G166K, 
G127S, E156K, P129K. P129R, S159R, and E156G {see Figure I). In another related 
embodiment, the nucleic acid contains a subsequence as described abov^ and encodes an 
endopeptidase containing one of the following combinations of substitution point mutations: 
G127S and E156R; P129G and E156R; P129G and E156K; E156R and S191K; P129K and 
E156R; P129R and E156K; E156K and G1663C; E156K and S191K; E156K and S191K and 
S159R; P129R and E156R; and E156G and G-166K. In another related ennbodiment, the 
nucleic acid contains a subsequence as described above and encodes an endopeptidase 
containing one or two amino acid substitutiorts selected fix)m P129G» El 56R, S191K, 
G166K,andG127S. 

[0127] The present invention also provides e3q>ression vectors containdng the above nucleic 
acids and host cells transfected with the vectoxs. 

Informatics 

[0128] The methods described above will produce valuable data regar<)ing the location of 
post-translational modifications on polypeptides. The data may be piovided in a variety of 
dataset forms, such as polypeptide fragment sequences, polypeptide fragmentation patterns 
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(as deduced, jfor example, fipom two dinaensional gel electrophoresis or mass spectrometrxX 
polypeptide firagment elution patterns (for example, j&om HPLC columns or c^illaiy gel 
electrophoresis colimms), and the like. Thus, the site of post-translational modificatioa c£ui 
be determined, for example, by conqparing the mass spectral or tvro-dimensional 
electiophoretic fragmentation pattmi of a degraded post-translationally modified polypqptide 
using infonnatic techniques. In an exemplary embodiment, the Lnfonnatic technique incLtides 
conqjaiing the mass spectral or two-dixnensional electrophoretic fragmentation pattern olF a 
degraded post-translationally modified, polypeptide to a known or predicted fragmentadojx 
pattern of the polypeptide using the methodologies disclosed beLow. 

[0129] As high-resolution, high-sensitivity datasets acquired ixsing the methods of the 
invention become available to the art, significant progress in the areas of diagnostics, 
therapeutics, drug developm^t, biosensor development, and ottier related areas will occur. 
For example, disease markers can be i<lentified and utilized for better confirmation of a 
disease condition or stage [see, U.S. Patent No. 5, 672,480; 5,5S>9,677; 5,939,533; and 
5,710,007). Subcellular toxicological information can be generated to bettd: direct dmg 
structure and activity correlation (see, Anderson, L., 'Tharmaceixtical Proteomics: Targe-ts, 
Mechanism, and Function," pap» presented at the IBC Proteonnics conference, Coronado, 
CA (June 11-12, 1998)). Subcellular toxicological information can also be utilized in a 
biological sensor device to predict the likely toxicological effect of chemical exposures ond 
likely tolerable exposure thresholds {^see^ U.S. Patent No. 5,81 1 2,231). Similar advantages 
accme fiom datasets relevant to odier biomolecules and bioacti^^e agents {e.g.^ nucleic acids, 
saccharides, lipids, drags, and the like). 

[0130] Thvis, in an exemplary embodiment, the present invention provides a database "that 
includes at least one set of data assay data. The data contained in the database is acquired 
using a metfciod of the invention. The database can be in substaaitially any form in whicti data 
can be maintained and transmitted, but is preferably an electronic database. The electroaiic 
database of the invention can be maintained on any electronic device allowing for the storage 
of and access to the database, such as a personal computer, but is preferably distributed on a 
wide area network, such as the World Wide Web. 

[0131] The compositions and methods described herein may he used to identify sites of 
post-translational modifications, or a lack thereof, on a variety of polypeptides from a ddvers 
array of sources. Such me&ods provide an abundance of information, which can be 
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coTielated with pathological conditioiis, predisposition to disease, drug testing, tfaerapeuHdc 
monitoiing, gene-4isease causal linkages, identification of correlates of immunity and 
physiologioal status, among others. Although the data gradated firom the methods of tixe 
invention is suited for manual review and analysis, in an exemplazy embodiment, prior c^ata 
processing -using hi^-speed computers is utilized. 

[0132] An array of methods for indexing and retriisving biomolecular information is known 
in the art. For example, U.S. Patents 6,023,659 and 5,966,712 disclose a relational datafcase 
system for storing biomolecular sequence information in a maimer that allows sequences to 
be catalogaed and searched according to one or more protein function hierarchies. U.S. 
Patent 5,953,727 discloses a relational database having sequence records containing 
information in a format that allows a collection of partial-lm^^ DNA sequences to be 
catalogued and searched according to association with one or more sequencing projects for 
obtaining full-length sequences fiom the collection of partial length sequences. U.S. Pa-i;ent 
5,706,498 discloses a gene database retrieval system for makixig a retrieval of a grae 
sequence similar to a sequ^ce data item in a geae database bssed on the degree of similarity 
between a key sequence and a target sequence. U.S. Patent 5,S3 8,897 discloses a metho<l 
using mass spectroscopy firagm^tation patterns of peptides to identify amino acid sequences 
in computer databases by comparison of predicted mass spectra with e>q)erimentally-denved 
mass spectra using a closeness-of-fit rmeasure. U.S. Patent 5,926,818 discloses a multi- 
dimension^il database comprising a fimctionality for multi-dimensional datd analysis 
described as on-line analytical processing (OLAP), which entails the consolidation of 
projected axid actual data according to more than one consolidUition path or dimension. U.S. 
PatCTt 5,295,261 reports a hybrid database structure in which the fields of each database 
record are divided into two classes, navigational and informational data, with navigatioinal 
fields stored in a hierarchical topological map which can be viewed as a tree stmcture 03: as 
the merger of two or more such tree structures. 

[0133] The present invention provides a computer database comprising a computer aod 
software for storing in computer-retrievable form assay data irecords cross-tabulated, fox 
example, with data specifying the soixrce of the post translationally modified polypeptitae 
and/or the liost cell or organism from which each sequence specificity record was obtaixied. 

[0134] In an exemplary embo^ment, at least one of the sovirces of the post translatioinally 
modified polypeptide is firom a tissue sample known to be fi-ee of pathological disoiders. In a 
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variation, at least one of the sources is a known patfaolo gical tissue specimen, for example^ a 
neoplastic lesion or a tissue specimen contaioing a patkogen such as a virus, bact^a or the 
like. Jn anodier variation, the assay records cross-tabulate one or more of the following 
paracueters for each target species in a sample: (1) a unique identification code, wtiich can 
include, for exanq)le, a target znolecular structure and/or characteristic separation coordinate 
(e,g.y electrophoretic coordinates); (2) sample source; ajid (3) absolute and/or relative 
quantity of the target species present in the sample. 

[0135] The invention also provides fox the storage an.d retrieval of a collection of target 
data in a computer data storage apparatus, which can iixclude magnetic disks, optioal disks, 
magneto-optical disks, DRAM, SRAM, SGRAM, SDR.AM, RDRAM, DDR RAA<1, magnetic 
bubble memory devices, and other data storage devices, including CPU registers and on-CPU 
data storage arrays. Typically, the data records are stored as a bit pattern in an anray of 
magnetic domains on a magnetizable medium or as an sacray of charge states or transistor gate 
states, such as an array of cells in a DRAM device (e.^. , each cell comprised of a "transistor 
and a. charge storage area, which may be on the transistor). In one embodiment, tbe invention 
provides such storage devices, and computer systems l>xiilt therewith, comprising ^ bit pattern 
encoding a protein expression £bagerprint record comprising unique id^tifiers for* at least 10 
polypeptide data records cross-tabulated with polypeptide sources. 

[013^ In an exemplary embodiment, the invention provides a method for identifying post- 
translationally modified sites, or a lack thereof, on tela'ted polypeptides, conapiising 
performing a computerized comparison between a polypeptide sequence assay record stored 
in or retrieved fi-om a computer storage device or database and at least one other sequence. 
The comparison can include a sequence analysis or comparison algorithm or conrputer 
program embodunent thereof (e.^., FASTA, TFASTA-, GAP, BESTFIT) and/or ttie 
comparison may be of the relative amoxmt of a polypeptide sequence in a pool of sequences 
determined from a polypeptide sample. 

[0137] The invention also preferably provides a magnetic disk, such as an IBMI-compatible 
(DOS, Windows, Windows95/98/2000, Windows NT, OS/2) or other format (e.gr~, Linux, 
SunOS, Solaris, ADC, SCO Urxix, VMS, MV, Macintosh, etc.) floppy diskette or liard (fixed, 
Wiixchester) disk drive, comprising a bit pattern encodLing data from an assay of tlie inv^tion 
in a file format suitable for retrieval and processing in a computerized sequence analysis, 
comparison, or relative quantitation method. 
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[0138] The invention also provides a network^ comprising a plurality of computing devices 
linked via a data link, such as an Ethernet cable <coax or 1 OBaseT), telephocie line, ISDN 
line, wireless netwotk, optical fiber, or other suitable signal transmission medium, whereby at 
least one network device (^.g., computer, disk array, etc.) comprises a pattern of magnetic 
domains (e.g., magnetic disk) and/or charge domains (e.g., an array of DRAJM cells) 
composing a bit pattCTi encoding data acquired £x>m an assay of the invention. 

[01391 The invention also provides a method for transmitting assay data that includes 
generating an electronic signal on an electronic commimications device, suoh as a modem, 
ISDN terminal ad^ter, DSL, cable modem, ATM switch, or the hke, wherein the signal 
includes (in native or encrypted format) a bit pa^ttem encoding data fi^om an- assay or a 
database comprising a plurality of assay results obtained by the method of tlie invention. 

[O140] In an exemplary embodiment, the invention provides a compute system for 
comparing a post translationally modified polypeptide to a database contaioing an array of 
dLata stmctures, such as an assay result obtained by the method of the inveaiion, and ranking 
database polypeptide targets based on tilie degree of identity and gap weigh-t to the target data, 
A central processor is preferably initialized to load and execute the computer program for 
alignment and/or comparison of the assay results. Data for a polypeptide target is entered 
into the central processor via an I/O device. Execution of the computer program results in the 
central processor retrieving the assay data fronx the data file, which compri ses a binary 
description of an assay result. 

[0141] The data or record and the computer program can be transfenred to secondary 
memory, which is typically random access memory (eg., DRAM, SRAM, SGRAM, or 
SDRAM). The polypeptide targets are ranked according to the degree of ooirespondence 
between a selected assay characteristic (eg., binding to a selected binding functionality) and 
the same characteristic of the post translaiionally modified polypeptide target and results are 
output via an I/O device. For example, a central processor can be a conventional computer 
(e^., Intel Pentium, PowerPC, Alpha, PA-800O, SPARC, MIPS 4400, MOPS 10000, VAX, 
etc,)l a program can be a commercial or public domain molecular biology software package 
(e.g.> UWGCG Sequence Analysis Software, Oarwin); a data file can be aoi optical or 
magnetic disk, a data server, a memory device (e.g., DRAM, SRAM, SGRAM, SDRAM, 
EPROM, bubble memory, flash memory, e/c.> ; an I/O device can be a texcninal comprising a 
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video display aixd a keyboard, a modeixx, an ISDN termmal adapter, an Etfaemet port, a 
punched card reader, a magnetic strip reader, or other suitable I/O device. 

[0142] The invention also preferably provides the use of a computer system, such as that 
described above, which comprises: (1) a computer; (2) a stored bit pattern encoding a 
collection of peptide sequence specificity records obtained by the methods of the invention, 
which maybe stored in the computer; (3) a comparison post trans lationally modified 
polypeptide target; and (4) a program for aUgmnent and comparison, typically with rank-* 
ordering of comparison results on the basis of computed similarit^r values. 



[0143] The present invention also provides a kit for practicing a method set forth herein. In 
an exemplary embodiment, the kit includes one or more componesnt useful to practice the 
method of the invention and instructions for using that compon^rt to practice the method of 
the invention. 

[01441 In a preferred embodiment, tfaie kit includes a container of an endppeptidase for tbie 
present invention and instmctions for using the 6ndopq)tidase to determine sites of post- 
translationally modification on the pol3/peptide. The examples tlxat follow are intended to 
finther illustrate the invention not to limit the scope of fixe invention. 

[0145] The terms and expressions wliich have been employed hiereia are used as terms o f 
description and not of limitation, and tbere is no intention in the use of such terms and 
expressions of excluding equivalents of the features shown and described, or portions thereof, 
it being recognized that various modifi-cations are possible withirt. the scope of the invention 
claimed. Moreover, any one or more features of any embodimenrt of the invention may be 
combined with any one or more other features of any other embodiment of the invention, 
without departing firom the scope of the invention. For example, the endonucleases descri"i>ed 
in the endonuclease section are equally applicable to the informatics methods described 
herein. All publications, patents, and patent ^pUcations cited herein are hereby incoipoia.ted 
by reference in their entirety for all purposes. 



Materials 

[0146] The BG2036 protease deficient strain of A subtilis and the pSS5 shuttte vector 
containing the subtilisin BPN* gene were employed. All pNA tetxapeptide substrates weres 
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from Bachem or Sigma. All Fmoc amino adds were from Bachem, Novabiodietm or 
Advanced Chemtech. All other xeagents were from Sigma unless noted. 

Example 1 

[0147] Example 1 describes a method for identifying candidate amino acid positions in a 
model exidopeptidase by comparing the three-dimensional structures of the mod^l 
endopeptidase and a post-transitionally modified polypeptide. In addition^ the structure of 
potential candidate endopeptidases are compared with a post-transitionally modLfied 
polyp^tide. In this example, tlxe post-translationally naodified polypeptide is a 
phosphotyrosine polypeptide anci the model endopeptidase is a subtilisin containing a sub- 
sequence of Figure 7. 

[0148] Candidate amino acid positions were identified by comparing the three-dim^isional 
model of a phosphotyrosine polypeptide and the subtilisin endopeptidase (see Fi.gure 3). As 
seen in Figure 3, the phosphotyxrosine moiety sterically clashes witibi proline 129 (mesh) and 
unfavorably interacts with glutaonate 156. Three-dimensional models of potentisd candidate 
subtilisin endopeptidases were also generated to assess the ability of various amino acids to 
bind to the phosphotyrosine pol3^eptide when introduc^ed into the candidate aiuLno acid 
positions. The subtilisin radopeptidase model was based on the known crystal stmcture. The 
phosphotyrosine polypeptide structure was predicted bssed on the primary sequence. The 
stmctures of subtilisin, the potential candidate subtilisixis and the phosphotyrosine substrates 
were biailt using the biopolymeir function within the InsigbtH software package starting from 
the PDB file ISUA on a Silicon Graphics O2 workstation. Backbone atoms were left fixed 
and reasonable side chain rotamers were evaluated using the Bump fimction to oheck for 
intermolecular and intramolecular steric clashes. 

[0149] The substitution point mutations of fhe resulting candidate subtilisins are shown in 
the abscissa of the graph of Figrure 4. 

Example 2 

[0150] Example 2 describes methods of constructing and purifying exemplary candidate 
endopeptidases. The candidate endopeptidases in this example are derived frona the subtilisin 
model endopeptidase as described in Example 1. 

[0151] Substitution point nnxtations as shown in Figxure 4 were introduced into the subtilisin 
gwe in the pSSS vector using tJic Quikchange protocol for PGR mutagenesis (Stratagene). 
All mvLtations were confirmed "by dideoxy sequencing. Monomer plasmid DN^ was 
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transfotmed into a RecA-^ strain ofE. coli (JMlOl , Stratagene) to prepare xnultimeric 
plasmids. This plasmid E^A was used to transform a protease deficient strain (BG2036) of 
subtUis (Kunst, 1993). Transformants were selected 12.5 ^g/ml cUoramphenicol and 
xestreaked on 1% sldm milk plates to confirm protease activity! 

[01 52] Subtilisin candidate endopeptidases were purified essentially by izh& method of 
Estell, In brief, 500 ml 2xYT (12.5 ug/ml chloramphenicol) was inoculat&d with 5 ml of an 
overnight culture and allowed to grow for 24 hours at 37 ""C. Cells were pelleted by 
centrifugation and one equivalent (500 ml) of —20 etbanol was added to the supernatant. 
The supernatant was centrifiiged for 1 5 minutes at 8000 rpm, and the pellet was discarded. A 
second equivalent of - 20 ethanol (1000 ml) was added to the supematemt, which was then 
left overnight at - 20 The resxilting supernatant was centriiuged for 15 minutes at 5000 
xpm and the supernatant was discarded. The pellet was resuspended in a rr^m\mti\ volume (2 
to 3 ml) of 50 mM Tris, pH 8.0, 5 mM CaCh and clarified at 18,000g for 30 minutes. Hie 
supernatant was tihen removed and precipitated ov^emight at 4 ''C with 3.5 volumes of 90% 
saturated ammonium sulpbate. The anamonium siilphate pellet was collec1:ed by 
centrifugation and resuspended in 2 -3 ml of 25 mM MES, 5 naM CaCU, pH 5.5 and dialyzed 
at 4 "^C in the same buflfer for 24 hours (3x1 L). -At this stage, protein preparations were 
typically >75% pure as judged by SDS-PAGE. For many mutants, the diaJyate was then 
loaded onto a Mono S colximn attached to a Biocad FPLC and eluted usins the same buffer 
ivith a gradient 0-500 mM NaCL Fractions were collected, aliquoted, flasb fiozen in liquid 
nitrogen and stored at -SO C. 

Example 3 

[0153] Example 3 describes the synthesis of a series of test polypeptides. In this example, 
the test polypeptides comprise a fluorescent donoxr-quencher pair. 

[0154] Test polypeptides were synthesized usias standard Fmoc pq}tid9 synthesis protocols 
starting firom Wang resin preloaded with Fmoc-Aj5p(0-tBu). For the sulpkiotyrosine peptide, 
a 2-chloiotrityl resin was utilized combined with a low temperature cleavage and 
deprotection (10 hours at 0 C) to overcome the inlierent acid labiUty of the tyrosine sulphate. 
All peptides were puiified to >95% by reverse ph.ase HPLC utilizing an 
acetonitrile/water/0.1%TFA solvent system and oliaracterized by electro^ray MS on a Pericin 
Elmer mass spectrometer. 
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[0155) The resulting test polypeptides are shown in Figure 5, wheirein Xxx represents a 
pho8photyrosine» sulfonyl tyrosine, tyrosine, phenylalanine, phosphoserine, 
phosphotibreonine, alanine, valine, leucuie, isoleucine, aspartic acid, glutanoic acid, arginine, 
or lysine as shown. The data in panel A ^as obtained using a test polypeptide contaiiiiag a 
succinyl-paranitroanalide fluorogenic donor-acceptor pair. The data, in panel B was obtained 
using a test polypeptide containing am aminobenzoic acid- tyrosiiieCN02)*aspartic acid 
fluorogenic donor-acceptor pair. 

Example 4 

[0156] Example 4 demonstrates a method for identifying an endopeptidase that site- 
specifically cleaves a peptide bond of a post-translationally modified polypeptide. The 
methods involve assaying the candidate subtilisins of Example 2 wi-th the test polypq)tides of 
Example 3. 

[0157] Kinetics for the fluorogenic substrates of the series Abz-Ptie-Arg-Pro-Xxx-Gly-IPhe- 
Y(NQ2)-Asp were measured in 50 mM Bicine, 2 mM CaCfc, pH 8.5 at 25® C by monitoring 
fluorescence at 420 nm upon excitation at 320 nm using a instnunerjit. Initial rate data fiom 8 
substrate conceixtrations bracketing the Km was measured in triplica^te and fit directly to tlae 
Michaelis Menten equation using the Prism software package (GraphPad, ). When it was not 
possible to saturate the enzyme, values for kcat/KM were obtained firom initial rates at low 
concentrations ( 10[S]<Km) using the relationship kcat/KM = Vo[S]. Kinetics for tetrapeptLde 
substrates of the series Suc-Ala-Ala-Pro-Xxx-pNa w^e measured l>y monitoring the change 
in absorbance at 412 nm over time using a Uvikon spectrophotometer. Protein concentrations 
were determined spectrophotometrically using an extinction coefiB&ient of 32.2 mM"^ cdil'^ at 
280mn (Matsubara, 1965). 

E3f:ample 5 

[0158] Example 5 demonstrates that subtilisin endopeptidases that site-specij5cally cleave a 
phosphotymsine polypeptide at the phosphorylated tyrosine are obtzained using the methods 
of the present itxvention, as demonstrated in Examples 1-4. 

[0159] Figure 4 illiistrates the phosphtotyrosine site-specificity of* the candidate subtilisin 
endopeptidases and the model subtUisin ^dopeptidase against either an unmodijSed tyrosine 
or phenylalaniae. As shown in Figure -4, subtUisin mdopeptidases containing the follow^ing 
substitution poiJit mutations were found to preferably cleave at the phosphotyrosine residue 
over a tyrosine residue or phenylalanine residue: G127S and E156R, P129G and E156R^ 
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P129G and EIS6K, E156R and S191BC P129K and BL 56R, P129R and El56f^ B156K and 
G166K. E1S6K and S191K, E1S6K and S191K and SJ.59R, P129R and E1S6R. and B156G 
and G166IC 

[0160] Figure 5 shows kinetic data for the site-speci:fic cleavage at a phosph-Otyrosine by a 
subtilisin endopeptidase containing the substitution podnt mutations P 129G an.d E156R. 

[01 61] Figure 6 shoves kixietic data for die site-q)eci:fic cleavage at a phospbtotyiosine by a 
subtilisin endopqptidase containing the substitution po±at mutadcms G127S aixd E156R. 
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WHAT IS CLAIMED IS: 

1 . A method for mapping a site of post-translationaJ modification on a 
post-translationally modified polypeptide, said method comprising: 

(a) site-specifically cleaving a peptide bond of the post--txanslationally 
modified polypeptide with an endopeptidase a-t said site of post-translaiional modification to 
produce a degraded post-translationally modified polypeptide; and 

(b) after step (a), determining said site of post-translational modification. 

2. The method of claim 1 , wherein said post-transl^tional modification is 
selected fi-om phosplaorylation, sulfonation, glycosylation, acetylation> methylations, ADP- 
ribosylation, methionine oxidation, cj^teine oxidation, and cysteine lipidation. 

3. The method of claim 1 » wherein said post-translational modification is 
phosphorylation of an amino acid selected firom tyrosine, serine, and itireonine. 

4. The method of claim 1 , wherein said post-translational modification is 
sulfonation of a tyrosine. 

5. The method of claim 1 , wherein said site of post-translational 
modification is determined by a method comprising determining the mass spectrometry 
fragmentation patteim of the degraded post-translationally modified polypeptide. 

6. The method of claim 1 , wherein said endop^tidase is a serine protease 
comprising an acti\re site that specifically binds to said post-translatioxial modification. 

7. The method of clsdm 6, wherein said serine protease is subtilisin. 

8. A serine protease which site-specifically cleaves a peptide bond of a 
post-translationally^ modified polypeptide at a site of post-translational modification, wherein 
said serine protease comprises an active site that binds to said site of 3>ost-translational 
modification. 

9. The serine protease of claim 8, wherein said post-translational 
modification is selected firom phosphoiylation, sulfonation, glycosylation, and acetylation. 



44 



wo 2004/016752 




PCTAJS2003/0254S6 



10. The serine protease of claim 8, wherein said post-translationial 
modification is phosphorylation, of an amino acid selecteci from tyrosine, serine, and 
threonine. 

1 1 . The sOTne protease of claim 8, wb.erein said post-translational 
modification is sulfonation of a tyrosine. 

12. The serine protease of claim 8, wtxerein said serine protease is 

subtilisin. 

13. The serine protease of claim 8, wtierein said serine protease is encoded 
by a nucleic acid sequence that hybridizes imder highly stringent hybridization cooiditions to 
a nucleic acid encoding a polypeptide comprising an amino acid sequence of Figcare 1, 
wherein the hybridization reaction is incubated at 42°C in a solution comprising 5 0% 
fonnamide, 5x SSC and 1% SDS, and washed at 65°C in a solution comprismg 0. 2x SSC and 
0.1% SDS. 

14. The serine protease of claim 8, wlherein said serine protease comprises 
a subsequence having at least 70% amino acid sequence identity to an amino acid, sequence 
ofFiguxe 1. 

1 5 . The serine protease of claim 8, wherein said serine proteas e comprises 
a subsequence having at least 70% amino acid sequence identity to an amino acicS sequence 
ofFiguxe 1 and contains at least one amino acid substitution selected from P129G, EI56R, 
S191K, G166K,andG127S. 

16. The serine protease of claim 8, therein said serine protease is encoded 
by an e?q>ression vector. 

17. A host cell comprising the expression vector of claim 16. 

18. An endopeptidase that site-specijacally cleaves a peptide bond of a 
post-traaslationally modified polypeptide at a site of post-translational modification, said 
endop^tidase produced by a method comprising: 

(a) introducmg one or more point mutations to a model endopeptadase at one 
or more candidate amino acid positions in an active site of said model endopeptidase to 
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produce a plurality of candidate endopq^tidases, wherein at least one of said plurality of 
candidate endopeptidases is an endopq)ticlase that site-specifically cleaves a peptide bond odf 
a post-translatioxially modified polypeptid^e at a ^te of post-translational modification; and 
(b) identifying said endc^eptidase that site-specific^ally said peptide bond of 
said post-translationally modified polypeptide cleaves at said site of post-translational 
modification 

1 9. The eadopeptidase of claim 18, wherein said model endopeptidase 
comprises a subsequence having at least 70% amino acid sequence identity to an amino acid 
sequence of Figrure 1. 

20. The endopeptidase of claim 18, wherein said model endopeptidase is- 
encoded by a niacleic acid sequence that hybridizes imder highly stringent hybridization 
conditions to a nucleic acid encoding a polypeptide comprising an amino acid sequence of 
Figure 1, wherein the hybridization reaction is incubated at 42*^0 in a solution comprising 
50% formamide, 5x SSC and 1% SDS, axid washed at 65^C in a solution comprising 0.2x 
SSCaudO.1% SDS. 

21 . The endopeptidase of claim 1 8, wherein said one or more candidate 
amino acid positions is selected fiom E156, S191, G166, a3id G127. 

22. The endopeptidase of claim 18, wherein said one or more candidate 
amino acid positions is PI 29 and said point mutation is a glydne or alanine substitution 

23. The endopeptidase of claim 18, wherein said one or more candidate 
amino acid positions is El 56 and said point mutation is an arginLxie substitution. 

24. The endopeptidas e of claim 1 8, wherein said one or more candidate 
amino acid positions is 6156 and said point mutation is a lysine substitution. 

25. The endopeptidase of claim 1 8, wherein said one or more candidate 
amino acid positions is P129 and E156, wherein said point muta-tion is glycine at pl29 anc3. 
arginine atB156. 

26. The endopeptidase of claim 18, wherein, before step (a), said one oa: 
more candidate amino acid positions are identij&ed by a method comprising: 
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(i) generating a three-dimensional structure of said model endop^tidase 

active site; 

(ii) gen^ating a three-dimensional structure of said post-tran&lationally 
modified polypeptide; 

(iv) compaiing the three-dimensional structure of said model <5ndopeptidase 
active site with said post-translationally modified polypeptide, thereby identifying one or 
more candidate amino acid positions diat, upon introduction of one or more point mutations 
at one or moie of said candidate amino acid positions, produces a plurality o:f candidate 
endopeptidases, wherein at least one of said plurality of candidate mdopeptidases is said 
endopeptidase that site-specifically said peptide bond of said post-translatior&ally modified 
polypeptide cleaves at said site of post-translatioiial modification. 

27 . An isolated nucleic acid enooding a endopeptidase wtuch site- 
specifically cleaves a peptide bond of a post-translationally modified polypeptide at a site of 
post-translational modification and which comprises one or more point mutations at one or 
more amino acid positions ^thin the endopeptidase active site^ 

wherein said isolated nucleic add bybridizes under highly stringent 

hybridization conditions to anucleic acid sequence of Figure 2, wherein 
the hybridization reaction is inonbated at 42^C in a solution comprising 
50% formamide, 5x SSC and 1 % SDS, and washed at eS'^C in a solution 
comprising 0.2x SSC and 0.1%» SDS. 

28 . An expression vector compxising the nucleic acid of claim 27. 

29. A host cell transfected withi the vector of claim 28. 

30. An isolated nucleic acid encoding a endopeptidase wtdch site- 
specifically cleaves a polypeptide backbone amide bond of a post-translationally modified 
polypeptide at a site of post-translational modification and which comprises one or more 
point mutations at one or more amino acid positions within the endopq)tidase active site, 

wherein said isolated nucleic acid eomprises a subsequence txaving at least 
70% nacleic acid sequence identity to a nucleic acid sequence of Figure 2. 

3 1 . An expression vector comprising the nucleic acid of claim 30. 

32. A host cell transfected witfci the vector of claim 30. 
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FIG. 1 
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FIG. 2 



GCGCAGTCCGTGCCTTACGGCGTATCACAAATTAAAGCCCCTGCTC 

TGCACTCTCAAGGCTACACTGGATCAAATGTTAAAGTAGCGOTTAT 

CGACAGCGGTA.TCGATTCTTCTCATCCTGATTTAAAGGTAGCAGGC 

GGAGCCAGCATGGTTCCTTCIGAAACAAATCCTTTCCAAGACAAC 

AACTCTCACGOAACTCACGTTGCCGGCACAGTTGCGGCTCTTAATA 

ACTCAATCGGTGTATTAGGCGTTGCGCCAAGCGCATCACTTTACGC 

TGTAAAAGTTCTCGGTGCTGACGGTTCCGGCCAATACAGCTGGATC 

ATTAACGGAArCGAGTGGGCX3ATCGC.AAACAATATGGACGXTATT 

AACATGAGCCTCGGCGGACCTTCTGG'TTCTGCTGCTTTAAAAGCGG 

CAGTTGATAAA.GCCGTTGCATCCGGCX3TCGTAGTCGTTGCGCjCAGC 

CGGTAACGAAGGCACTTCCGGCAGCTCAAGCACAGTGGGCTACCC 

TGGTAAATACCCTTCTGTCATTGCAGrAGGCGCTGTTGACAGCAGC 

AACCAAAGAGCATCTTTCTCAAGCGTiA.GGACCTGAGCTTGA.TGTC 

ATGGCACCTGG-CGTATCTATCCAAAGCACGCTTCCTGGAAACAAA 

TACGGGGCGTA^CAACGGTACGTCAATOGCATCTCCGCACOrTGCC 

GGAGCGGCTGCTTTGATTCTTTCTAAGJCACCCGAACTGGACAAACA 

CTCAAGTCCGCJ^GCAGTTTAGAAAAC^CCACTACAAAACrrOGTG 

ATTCTTTCTACXATGGAAAAGGGCTGATCAACGTACAGGCGOCAG 

CTCAGTAA 
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FIG. 4 
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FIG. 5 
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FIG. 6 
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FIG. 7 
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FIG. 8 



GTGAGAGGCAAAAAAGTATGGATCAGTTrGCTGTTTGC 

TTXAGCGTTAATCTTTACGATGGCGTTCGGCAGCACAT 

rf^'TrrfyrrrAGGrfi nCAGGGAAATCAAArC^GGQAAAAG 

AAATATATTGTCGGGTTTAAACAGACAATGAGCACGATGA 

GCGCCGCTAAGAAGA A AGATGTCATTTCTGAAAAAGGCG 

GGAAAGTGCAAAAGCA^TTCAAATATGTAG-ACGCAGCTTC 

AGCn-ACATTAAACGAAAAAGCTGTAAAAGA ^TTGAAAAA 

AGACCCGAGCGTCGCTTACGTTGAAGAAGATCACGTAGCA 

CATGCGTACG CGCAGTCCGTGCCTTACGGCGTATCACAAA 

TTAAAGCCCCTGCTCrGrCACTCTCAAGGCTA-CACTGGATC 

AAATGTTAAAGTAGCGGTTATCGACAGCGG'TATCGATTCT 

TCTCATCCTGATTTAAAGGTAGCAGGCGGAGCCAGCATGG 

TTCCTTCTGAAACAAA'TCCTTTCCAAGACAA.CAACrCTCAC 

GGAACTCACGTTGCCGGCACAGTTGCGGCTCTTAATAACT 

CAATCGGTGTATTAGGCXjTTGCXKX^AAGaSCATCACTTTA 

CGCTGTAAAAGTTCTCGGTGCTGACGGTTCCGGCCAATAC 

AGCTGGATCATTAACGGAATCGAGTGGGCGATCGCAAACA 

ATATGGACGTTATTAACATGAGCCTCGGCGGACCTTCTGG 

TTCrroCrGCTTTAAAACK:GGCAGTTGATAAA^GCCGTTGCA 

TCCGGCGTCGTAGTCGXTGCGGCAGCCGGTAACGAAGGCA 

cttccggcagctcaagcacagtgggctacccrrggtaaata 

cccttctgtcattgcagi^taggcgctgttgacagcagcaac 

caaagagcatctttcrcaagcgtaggacctgagcttgatg 

tcatggcacctggcx3tatctatccaaagcacgcttcctgg 

aaacaaatacggggcc5tacaacggtacgtcaatggcatct 

ccocacgttgccggagcx3gctgcttroattctttctaagca 

cccgaactcgacaaacactcaagtccgcag<;agtttagaa 

aacaccactacaaaacttggtgattctttcttactatggaa 

AAGGGCTGATCAACGTACAGGCGGCAGCTCAGTAA 
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